Federated Proximal (FedProx) extends FedAvg by introducing a proximal term that constrains local updates to remain close to the global model. This addresses client drift under heterogeneous data distributions.
Each client minimizes a modified local objective:
\[\min_{w} F_k(w) + \frac{\mu}{2}\|w - w^t\|^2\]where:
The aggregation step remains identical to FedAvg:
\[w^{t+1} = \sum_{k=1}^{K} \frac{n_k}{N} w_k^{t+1}\]The implementation is located at src/unbitrium/aggregators/fedprox.py.
When $\mu = 0$, FedProx reduces to FedAvg:
\[\mu = 0 \implies \text{FedProx} \equiv \text{FedAvg}\]Verification: Property-based testing confirms identical outputs when $\mu = 0$.
The proximal term bounds the drift of local models:
\[\|w_k^{t+1} - w^t\|^2 \leq \frac{2(F_k(w^t) - F_k^*)}{\mu}\]Verification: Measured drift decreases monotonically as $\mu$ increases.
Under standard convexity assumptions, FedProx converges at rate:
\[\mathbb{E}[F(w^T)] - F^* \leq O\left(\frac{1}{\mu T}\right)\]Verification: Empirical convergence matches theoretical bounds on convex benchmarks.
Given identical inputs and random seeds, the output is reproducible.
Verification: Repeated executions with fixed seeds produce bit-identical results.
Configuration:
Expected Behavior:
| $\mu$ | Convergence Speed | Final Accuracy | Client Drift |
|---|---|---|---|
| 0 | Baseline | Baseline | High |
| 0.01 | Similar | Similar | Moderate |
| 0.1 | Slightly slower | Similar | Low |
| 1.0 | Slower | Potentially lower | Very low |
| 10.0 | Much slower | Lower | Minimal |
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
| $\mu$ Range | Effect | Recommendation |
|---|---|---|
| $[0, 0.001)$ | Negligible regularization | Use FedAvg instead |
| $[0.001, 0.1)$ | Mild drift reduction | Good default range |
| $[0.1, 1.0)$ | Strong regularization | For extreme non-IID |
| $[1.0, \infty)$ | Over-regularization | May hurt convergence |
| Metric | Range | Notes |
|---|---|---|
proximal_loss |
$[0, \infty)$ | Additional loss from proximal term |
avg_drift_norm |
$[0, \infty)$ | Mean $|w_k - w^t|$ across clients |
mu_effective |
$\mu$ | Regularization strength used |
Input: $\mu = 0$
Expected Behavior:
Input: $\mu > 100$
Expected Behavior:
Input: Local epochs $E = 1$
Expected Behavior:
def set_seed(seed: int = 42) -> None:
import random
import numpy as np
import torch
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
| Component | Version |
|---|---|
| Python | 3.12.0 |
| PyTorch | 2.4.0 |
| Unbitrium | 1.0.0 |
git clone https://github.com/olaflaitinen/unbitrium
cd unbitrium
pip install -e ".[dev]"
pytest tests/validation/test_fedprox.py -v
FedProx shares the same privacy profile as FedAvg:
The proximal term introduces a dependency on the global model:
Local training per client: \(T_{local}(E, B, P) = O(E \cdot \lceil n_k/B \rceil \cdot P)\)
Proximal gradient computation: \(T_{prox}(P) = O(P)\)
Total per-round complexity: \(T_{round}(K, P) = O(K \cdot P)\)
Breakdown:
The LEAF FedProx implementation produces equivalent results within tolerance $\epsilon < 10^{-4}$.
TFF’s FedProx matches Unbitrium when using identical:
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems (Vol. 2, pp. 429-450).
McMahan, H. B., et al. (2017). Communication-efficient learning of deep networks from decentralized data. In AISTATS.
Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., & Suresh, A. T. (2020). SCAFFOLD: Stochastic controlled averaging for federated learning. In ICML.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.