unbitrium

FedAvg Validation Report

Overview

Federated Averaging (FedAvg) is the foundational aggregation algorithm in federated learning, introduced by McMahan et al. (2017). This document validates the Unbitrium implementation against the formal specification and establishes correctness guarantees.

Mathematical Formulation

The FedAvg algorithm computes a weighted average of client model updates:

\[w^{t+1} = \sum_{k=1}^{K} \frac{n_k}{N} w_k^t\]

where:

Implementation Reference

The implementation is located at src/unbitrium/aggregators/fedavg.py.


Invariants

The FedAvg implementation must satisfy the following invariants:

Invariant 1: Weight Conservation

The sum of aggregation weights must equal unity:

\[\sum_{k=1}^{K} \frac{n_k}{N} = 1\]

Verification: For any valid input where $n_k > 0$ for at least one client, the implementation computes normalized weights that sum to 1.0 within floating-point tolerance ($\epsilon < 10^{-6}$).

Invariant 2: Identity Preservation

When all client models are identical, the aggregated result equals the common model:

\[\forall k: w_k = w \implies w^{t+1} = w\]

Verification: Property-based testing confirms this invariant holds for arbitrary model architectures.

Invariant 3: Linearity

The aggregation operation is linear in client weights:

\[\text{FedAvg}(\{(w_k, \alpha n_k)\}) = \text{FedAvg}(\{(w_k, n_k)\})\]

Verification: Scaling all sample counts by a constant factor produces identical results.

Invariant 4: Commutativity

The order of client updates does not affect the result:

\[\text{FedAvg}([u_1, u_2, \ldots, u_K]) = \text{FedAvg}(\pi([u_1, u_2, \ldots, u_K]))\]

where $\pi$ is any permutation.

Verification: Randomized permutation tests confirm order-independence.

Invariant 5: Determinism

Given identical inputs and random seeds, the output is reproducible:

Verification: Repeated executions with fixed seeds produce bit-identical results.


Test Distributions

Distribution 1: Uniform IID Baseline

Configuration:

Expected Behavior:

Validation Code:

import torch
import unbitrium as ub

# Deterministic setup
torch.manual_seed(42)

# Create identical contribution scenario
updates = [
    {"state_dict": model.state_dict(), "num_samples": 100}
    for model in [create_mlp() for _ in range(10)]
]

aggregator = ub.aggregators.FedAvg()
result, metrics = aggregator.aggregate(updates, global_model)

assert abs(metrics["total_samples"] - 1000.0) < 1e-6
assert metrics["num_participants"] == 10.0

Distribution 2: Imbalanced Sample Counts

Configuration:

Expected Weights:

Validation:

samples = [1000, 500, 250, 125, 125]
total = sum(samples)
expected_weights = [s / total for s in samples]

# Verify weights sum to 1
assert abs(sum(expected_weights) - 1.0) < 1e-10

Distribution 3: Extreme Imbalance

Configuration:

Expected Behavior:

Distribution 4: Single Client Edge Case

Configuration:

Expected Behavior:

Distribution 5: Zero Sample Edge Case

Configuration:

Expected Behavior:


Expected Behavior

Metric Ranges

Metric Range Notes
num_participants $[0, K]$ Number of clients with valid updates
total_samples $[0, \infty)$ Sum of sample counts across clients
Aggregation time $O(K \cdot P)$ Linear in clients and parameters
Memory overhead $O(P)$ Single model copy for accumulation

Convergence Properties

Under standard assumptions (bounded gradients, smooth loss, learning rate decay):

Numerical Stability

The implementation maintains stability under:


Edge Cases

Edge Case 1: Empty Update List

Input: updates = []

Expected Output:

Validation:

result, metrics = aggregator.aggregate([], global_model)
assert metrics["aggregated_clients"] == 0.0

Edge Case 2: NaN in Client Updates

Input: Client model contains NaN values

Expected Behavior:

Current Implementation: Propagates NaN (to be addressed in future version)

Edge Case 3: Mixed Tensor Dtypes

Input: Updates contain tensors with different dtypes (float16, float32)

Expected Behavior:

Edge Case 4: Non-Tensor State Dict Values

Input: State dict contains non-tensor values (running mean/var buffers)

Expected Behavior:


Reproducibility

Seed Configuration

For deterministic validation:

import random
import numpy as np
import torch

def set_seed(seed: int = 42) -> None:
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

Validation Environment

Component Version
Python 3.12.0
PyTorch 2.4.0
NumPy 2.0.0
Unbitrium 1.0.0

Replication Instructions

  1. Clone repository at commit main
  2. Install dependencies: pip install -e ".[dev]"
  3. Run validation: pytest tests/validation/test_fedavg.py -v

Security Considerations

Information Leakage Analysis

FedAvg requires access to:

Privacy Implications:

Mitigations

  1. Sample Count Protection: Use approximate or noised sample counts
  2. Secure Aggregation: Integrate with unbitrium.privacy.SecureAggregation
  3. Differential Privacy: Apply gradient clipping and noise via unbitrium.privacy.DifferentialPrivacy

Attack Vectors

Attack Description Mitigation
Model Inversion Reconstruct training data from weights Differential privacy
Membership Inference Determine if sample was in training set Gradient clipping
Byzantine Clients Malicious updates corrupt global model Use robust aggregators (Krum, TrimmedMean)

Complexity Analysis

Time Complexity

\[T(K, P) = O(K \cdot P)\]

where:

Breakdown:

Space Complexity

\[S(P) = O(P)\]

Breakdown:

Parallelization Potential


Comparison with Reference Implementations

TensorFlow Federated

The TFF implementation (tff.learning.algorithms.build_weighted_fed_avg) produces identical results under:

PySyft

The PySyft implementation matches Unbitrium within floating-point tolerance ($\epsilon < 10^{-5}$).

Flower

The Flower FedAvg strategy produces equivalent results when configured with the same aggregation function.


References

  1. McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273-1282). PMLR.

  2. Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems (Vol. 2, pp. 429-450).

  3. Kairouz, P., et al. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1-2), 1-210.


Changelog

Version Date Changes
1.0.0 2026-01-04 Initial validation report

Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.