unbitrium

GradientVariance Validation Report

Overview

Gradient Variance measures the dispersion of client gradients around the mean gradient. High variance indicates significant client drift, a key challenge in heterogeneous federated learning.

Mathematical Formulation

\[\sigma^2 = \frac{1}{K} \sum_{k=1}^{K} \|\nabla F_k(w) - \bar{\nabla}F(w)\|^2\]

where:

Weighted variant:

\[\sigma^2_w = \sum_{k=1}^{K} \frac{n_k}{N} \|\nabla F_k(w) - \bar{\nabla}F(w)\|^2\]

Implementation Reference

The implementation is located at src/unbitrium/metrics/optimization.py.


Invariants

Invariant 1: Non-negativity

\[\sigma^2 \geq 0\]

Verification: Variance is always non-negative.

Invariant 2: Zero for Identical Gradients

\[\nabla F_k = \nabla F_j \text{ for all } k, j \implies \sigma^2 = 0\]

Verification: Identical gradients yield zero variance.

Invariant 3: Scale Invariance (Normalized)

For normalized variant:

\[\sigma^2_{norm} = \frac{\sigma^2}{\|\bar{\nabla}F\|^2}\]

Verification: Normalized variance independent of gradient magnitude.

Invariant 4: Additivity Over Dimensions

\[\sigma^2 = \sum_{i=1}^{P} \sigma^2_i\]

where $\sigma^2_i$ is variance in dimension $i$.


Test Distributions

Distribution 1: IID Data

Configuration:

Expected Behavior:

Distribution 2: High Non-IID (Dirichlet 0.1)

Configuration:

Expected Behavior:

Distribution 3: Single Epoch vs Multiple

Configuration:

Expected Behavior:

Distribution 4: Byzantine Client

Configuration:

Expected Behavior:


Expected Behavior

Variance Interpretation

$\sigma^2 / |\bar{\nabla}|^2$ Heterogeneity Recommended Action
0.0 - 0.1 Minimal Standard FedAvg
0.1 - 0.5 Low FedAvg works
0.5 - 1.0 Moderate Consider FedProx
1.0 - 3.0 High Use SCAFFOLD/FedDyn
3.0+ Severe Advanced methods needed

Correlation with Convergence

FedProx regularization threshold:


Edge Cases

Edge Case 1: Single Client

Input: $K = 1$

Expected Behavior:

Edge Case 2: Zero Mean Gradient

Input: $\bar{\nabla}F = 0$

Expected Behavior:

Edge Case 3: High-Dimensional Gradients

Input: $P = 10^9$ parameters

Expected Behavior:


Reproducibility

Usage Example

from unbitrium.metrics import GradientVariance

metric = GradientVariance(normalize=True)

# Collect gradients from all clients
gradients = [client.compute_gradient(global_model) for client in clients]

variance = metric.compute(gradients)
print(f"Gradient Variance: {variance:.4f}")

Per-Layer Analysis

# Analyze variance per layer
layer_variances = metric.compute_per_layer(gradients, model)
for layer, var in layer_variances.items():
    print(f"{layer}: {var:.4f}")

Security Considerations

Information Content

Gradient variance reveals:

Mitigations

  1. Compute variance on aggregated data only
  2. Differential privacy on gradient statistics

Complexity Analysis

Time Complexity

\[T = O(K \cdot P)\]

Two passes: compute mean, then compute variance.

Space Complexity

\[S = O(K \cdot P)\]

Must store all client gradients.


Gradient Dissimilarity

\[\delta = \max_k \|\nabla F_k - \bar{\nabla}F\|\]

Maximum rather than average deviation.

Cosine Disagreement

\[\text{CD} = \frac{1}{K}\sum_k \left(1 - \frac{\langle \nabla F_k, \bar{\nabla}F \rangle}{\|\nabla F_k\| \|\bar{\nabla}F\|}\right)\]

Directional disagreement.


References

  1. Li, T., et al. (2020). Federated optimization in heterogeneous networks. In MLSys.

  2. Karimireddy, S. P., et al. (2020). SCAFFOLD: Stochastic controlled averaging for federated learning. In ICML.

  3. Woodworth, B., et al. (2020). Is local SGD better than minibatch SGD? In ICML.


Changelog

Version Date Changes
1.0.0 2026-01-04 Initial validation report

Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.