FedAdam applies the Adam optimizer at the server level to aggregated client updates, providing adaptive learning rates across parameters. It is part of the FedOpt family of algorithms.
After aggregating client updates to obtain $\Delta_t$, the server applies Adam:
First moment estimate: \(m_t = \beta_1 m_{t-1} + (1 - \beta_1) \Delta_t\)
Second moment estimate: \(v_t = \beta_2 v_{t-1} + (1 - \beta_2) \Delta_t^2\)
Bias-corrected estimates: \(\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \quad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}\)
Model update: \(w^{t+1} = w^t - \eta \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}\)
The implementation is located at src/unbitrium/aggregators/fedadam.py.
When $\beta_1 = 0$, $\beta_2 = 0$, and $\epsilon \to 0$:
\[\text{FedAdam} \approx \text{FedAvg} \text{ with learning rate } \eta\]Verification: Limiting case produces FedAvg-like behavior.
Moment estimates are bounded:
\[\|m_t\| \leq \frac{\max_\tau \|\Delta_\tau\|}{1 - \beta_1}\]Verification: Moment norms remain bounded.
High-variance dimensions receive smaller updates:
\[\frac{\partial w_i}{\partial t} \propto \frac{1}{\sqrt{v_{t,i}}}\]Verification: Update magnitude inversely related to variance.
Early updates are properly scaled:
\[\lim_{t \to \infty} \frac{1}{1 - \beta^t} = 1\]Verification: Bias correction diminishes over rounds.
Configuration:
Expected Behavior:
| $\beta_1$ | $\beta_2$ | Convergence | Stability |
|---|---|---|---|
| 0.9 | 0.999 | Fast | High |
| 0.5 | 0.99 | Moderate | Moderate |
| 0.99 | 0.9999 | Slow | Very high |
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
| Parameter | Default | Range |
|---|---|---|
| $\beta_1$ | 0.9 | $[0, 1)$ |
| $\beta_2$ | 0.999 | $[0, 1)$ |
| $\epsilon$ | $10^{-8}$ | $(0, 10^{-4}]$ |
| $\eta$ | 0.001 | $(0, 1]$ |
| Metric | Range | Notes |
|---|---|---|
first_moment_norm |
$[0, \infty)$ | Norm of first moment |
second_moment_norm |
$[0, \infty)$ | Norm of second moment |
effective_lr |
$(0, \eta)$ | Per-dimension effective LR |
bias_correction_factor |
$(1, \infty)$ | Current bias correction |
Input: $t = 1$
Expected Behavior:
Input: $\Delta_t = 0$
Expected Behavior:
Input: $|\Delta_t| \gg 1$
Expected Behavior:
Input: $v_t \to 0$ for some dimension
Expected Behavior:
def set_seed(seed: int = 42) -> None:
import random, numpy as np, torch
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
# State persisted across rounds
optimizer_state = {
"m": torch.zeros_like(global_params),
"v": torch.zeros_like(global_params),
"t": 0,
}
Server optimizer state may reveal:
Adam operations are element-wise.
Breakdown:
| Algorithm | Server Optimizer | Moment Updates |
|---|---|---|
| FedAvg | SGD (implicit) | None |
| FedAdam | Adam | Both moments |
| FedYogi | Yogi | Adaptive v |
| FedAdagrad | Adagrad | Cumulative v |
| Method | Final Accuracy | Rounds to 70% |
|---|---|---|
| FedAvg | 75.3% | 200 |
| FedAdam | 79.1% | 140 |
| FedYogi | 78.5% | 150 |
| Method | Perplexity |
|---|---|
| FedAvg | 1.42 |
| FedAdam | 1.31 |
Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konecny, J., Kumar, S., & McMahan, H. B. (2021). Adaptive federated optimization. In ICLR.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Zaheer, M., et al. (2018). Adaptive methods for nonconvex optimization. In NeurIPS.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.