Federated Client Momentum (FedCM) introduces client-level momentum to dampen update oscillations and improve convergence stability under heterogeneous data distributions.
Each client maintains a momentum buffer $v_k$ updated as:
\[v_k^{t+1} = \beta v_k^t + \nabla F_k(w_k^t)\]Local model update:
\[w_k^{t+1} = w_k^t - \eta v_k^{t+1}\]The global aggregation follows standard weighted averaging:
\[w^{t+1} = \sum_{k=1}^K \frac{n_k}{N} w_k^{t+1}\]where:
The implementation is located at src/unbitrium/aggregators/fedcm.py.
Momentum buffers decay when gradients vanish:
\[\nabla F_k = 0 \implies v_k^{t+1} = \beta v_k^t\]Verification: Zero gradient input produces decayed momentum.
When $\beta = 0$, reduces to vanilla SGD:
\[\beta = 0 \implies v_k^{t+1} = \nabla F_k(w_k^t)\]Verification: $\beta = 0$ produces identical results to non-momentum training.
Momentum norm is bounded for bounded gradients:
\[\|v_k^t\| \leq \frac{G}{1 - \beta}\]where $G$ is gradient bound.
Verification: Momentum norms remain bounded under gradient clipping.
Momentum buffers persist across rounds:
\[v_k^t = \beta v_k^{t-1} + g_k^t\]Verification: Buffer state correctly maintained in aggregator state.
Configuration:
Expected Behavior:
| $\beta$ | Convergence Speed | Stability |
|---|---|---|
| 0 | Baseline | Low |
| 0.5 | Faster | Moderate |
| 0.9 | Fastest | High |
| 0.99 | Slower | Very stable |
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
| $\beta$ Range | Effect | Use Case |
|---|---|---|
| $[0, 0.5)$ | Minimal smoothing | Stable settings |
| $[0.5, 0.9)$ | Moderate smoothing | Default |
| $[0.9, 0.99)$ | Strong smoothing | High variance |
| $[0.99, 1.0)$ | Very slow adaptation | Extreme noise |
| Metric | Range | Notes |
|---|---|---|
momentum_beta |
$[0, 1)$ | Momentum coefficient |
avg_momentum_norm |
$[0, \infty)$ | Mean momentum buffer norm |
momentum_variance |
$[0, \infty)$ | Variance in client momenta |
effective_lr |
$(0, \eta/(1-\beta))$ | Effective learning rate |
Input: $\beta = 0$
Expected Behavior:
Input: $\beta = 0.999$
Expected Behavior:
Input: Round $t = 0$
Expected Behavior:
Input: Client $k$ drops out for rounds
Expected Behavior:
def set_seed(seed: int = 42) -> None:
import random, numpy as np, torch
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
# State preserved across rounds
aggregator_state = {
"momentum_buffers": {
client_id: torch.zeros_like(model_params)
for client_id in client_ids
}
}
Momentum buffers encode gradient history:
| Attack | Description | Impact |
|---|---|---|
| Momentum Extraction | Infer historical gradients | Data leakage |
| State Poisoning | Corrupt momentum buffers | Destabilize training |
Note: Requires storing momentum buffer per client, doubling memory per client.
| Method | Rounds to 70% |
|---|---|
| FedAvg | 200 |
| FedCM ($\beta=0.9$) | 150 |
| FedCM ($\beta=0.99$) | 180 |
| Method | Loss Variance |
|---|---|
| FedAvg | 0.045 |
| FedCM | 0.012 |
Xu, J., et al. (2021). Federated learning with client-level momentum. In ICLR.
Hsu, T. M. H., et al. (2019). Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint.
Reddi, S., et al. (2021). Adaptive federated optimization. In ICLR.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.