Federated Averaging (FedAvg) is the foundational aggregation algorithm in federated learning, introduced by McMahan et al. (2017). This document validates the Unbitrium implementation against the formal specification and establishes correctness guarantees.
The FedAvg algorithm computes a weighted average of client model updates:
\[w^{t+1} = \sum_{k=1}^{K} \frac{n_k}{N} w_k^t\]where:
The implementation is located at src/unbitrium/aggregators/fedavg.py.
The FedAvg implementation must satisfy the following invariants:
The sum of aggregation weights must equal unity:
\[\sum_{k=1}^{K} \frac{n_k}{N} = 1\]Verification: For any valid input where $n_k > 0$ for at least one client, the implementation computes normalized weights that sum to 1.0 within floating-point tolerance ($\epsilon < 10^{-6}$).
When all client models are identical, the aggregated result equals the common model:
\[\forall k: w_k = w \implies w^{t+1} = w\]Verification: Property-based testing confirms this invariant holds for arbitrary model architectures.
The aggregation operation is linear in client weights:
\[\text{FedAvg}(\{(w_k, \alpha n_k)\}) = \text{FedAvg}(\{(w_k, n_k)\})\]Verification: Scaling all sample counts by a constant factor produces identical results.
The order of client updates does not affect the result:
\[\text{FedAvg}([u_1, u_2, \ldots, u_K]) = \text{FedAvg}(\pi([u_1, u_2, \ldots, u_K]))\]where $\pi$ is any permutation.
Verification: Randomized permutation tests confirm order-independence.
Given identical inputs and random seeds, the output is reproducible:
Verification: Repeated executions with fixed seeds produce bit-identical results.
Configuration:
Expected Behavior:
Validation Code:
import torch
import unbitrium as ub
# Deterministic setup
torch.manual_seed(42)
# Create identical contribution scenario
updates = [
{"state_dict": model.state_dict(), "num_samples": 100}
for model in [create_mlp() for _ in range(10)]
]
aggregator = ub.aggregators.FedAvg()
result, metrics = aggregator.aggregate(updates, global_model)
assert abs(metrics["total_samples"] - 1000.0) < 1e-6
assert metrics["num_participants"] == 10.0
Configuration:
Expected Weights:
Validation:
samples = [1000, 500, 250, 125, 125]
total = sum(samples)
expected_weights = [s / total for s in samples]
# Verify weights sum to 1
assert abs(sum(expected_weights) - 1.0) < 1e-10
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
| Metric | Range | Notes |
|---|---|---|
num_participants |
$[0, K]$ | Number of clients with valid updates |
total_samples |
$[0, \infty)$ | Sum of sample counts across clients |
| Aggregation time | $O(K \cdot P)$ | Linear in clients and parameters |
| Memory overhead | $O(P)$ | Single model copy for accumulation |
Under standard assumptions (bounded gradients, smooth loss, learning rate decay):
The implementation maintains stability under:
Input: updates = []
Expected Output:
{"aggregated_clients": 0.0}Validation:
result, metrics = aggregator.aggregate([], global_model)
assert metrics["aggregated_clients"] == 0.0
Input: Client model contains NaN values
Expected Behavior:
ValueErrorCurrent Implementation: Propagates NaN (to be addressed in future version)
Input: Updates contain tensors with different dtypes (float16, float32)
Expected Behavior:
Input: State dict contains non-tensor values (running mean/var buffers)
Expected Behavior:
For deterministic validation:
import random
import numpy as np
import torch
def set_seed(seed: int = 42) -> None:
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
| Component | Version |
|---|---|
| Python | 3.12.0 |
| PyTorch | 2.4.0 |
| NumPy | 2.0.0 |
| Unbitrium | 1.0.0 |
mainpip install -e ".[dev]"pytest tests/validation/test_fedavg.py -vFedAvg requires access to:
Privacy Implications:
unbitrium.privacy.SecureAggregationunbitrium.privacy.DifferentialPrivacy| Attack | Description | Mitigation |
|---|---|---|
| Model Inversion | Reconstruct training data from weights | Differential privacy |
| Membership Inference | Determine if sample was in training set | Gradient clipping |
| Byzantine Clients | Malicious updates corrupt global model | Use robust aggregators (Krum, TrimmedMean) |
where:
Breakdown:
Breakdown:
The TFF implementation (tff.learning.algorithms.build_weighted_fed_avg) produces identical results under:
The PySyft implementation matches Unbitrium within floating-point tolerance ($\epsilon < 10^{-5}$).
The Flower FedAvg strategy produces equivalent results when configured with the same aggregation function.
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273-1282). PMLR.
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems (Vol. 2, pp. 429-450).
Kairouz, P., et al. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1-2), 1-210.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.