Client Drift Norm measures the divergence between local client models and the global model in parameter space. It quantifies how far clients deviate during local training.
L2 norm of parameter difference:
\[\text{Drift}_k^t = \|w_k^t - w^t\|_2\]where:
Relative drift:
\[\text{RelDrift}_k^t = \frac{\|w_k^t - w^t\|_2}{\|w^t\|_2}\]The implementation is located at src/unbitrium/metrics/optimization.py.
Verification: Norm is always non-negative.
Verification: Same model yields zero drift.
Verification: L2 norm is a proper metric.
Drift typically increases with local training:
\[E_1 < E_2 \implies \mathbb{E}[\text{Drift}|E_1] \leq \mathbb{E}[\text{Drift}|E_2]\]Verification: More local epochs lead to higher drift (on average).
Configuration:
Expected Output: Drift = 0 for all clients
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
| Relative Drift | Severity | Recommended Action |
|---|---|---|
| 0.0 - 0.01 | Minimal | FedAvg works |
| 0.01 - 0.1 | Low | Monitor |
| 0.1 - 0.5 | Moderate | Consider FedProx |
| 0.5 - 1.0 | High | Use regularization |
| 1.0+ | Severe | Reduce local epochs or use SCAFFOLD |
| Factor | Effect on Drift |
|---|---|
| Local epochs $E$ | Increases with $E$ |
| Learning rate $\eta$ | Increases with $\eta$ |
| Data heterogeneity | Increases with non-IID |
| Model complexity | Generally increases |
| Batch size | Inversely related |
Input: $w^t = 0$ (e.g., at initialization)
Expected Behavior:
Input: Near convergence
Expected Behavior:
Input: Learning rate too high
Expected Behavior:
from unbitrium.metrics import ClientDriftNorm
metric = ClientDriftNorm(normalize=True)
# After local training
global_params = global_model.state_dict()
client_params = client_model.state_dict()
drift = metric.compute(client_params, global_params)
print(f"Client Drift: {drift:.4f}")
# Compute for all clients
drifts = []
for client in clients:
drift = metric.compute(client.model, global_model)
drifts.append(drift)
print(f"Mean Drift: {np.mean(drifts):.4f}")
print(f"Max Drift: {np.max(drifts):.4f}")
print(f"Std Drift: {np.std(drifts):.4f}")
Drift reveals:
Single pass over parameters.
Store both model copies.
FedProx regularization explicitly penalizes drift:
\[\min_w F_k(w) + \frac{\mu}{2}\|w - w^t\|^2\]The drift norm is exactly the term being penalized.
Empirical heuristic:
\[\mu_{opt} \approx \frac{1}{\mathbb{E}[\text{Drift}_k^2]}\]Scale $\mu$ inversely with typical drift.
Li, T., et al. (2020). Federated optimization in heterogeneous networks. In MLSys.
Karimireddy, S. P., et al. (2020). SCAFFOLD: Stochastic controlled averaging for federated learning. In ICML.
Wang, S., et al. (2020). Tackling the objective inconsistency problem in heterogeneous federated optimization. In NeurIPS.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.