unbitrium

EMDLabelDistance Validation Report

Overview

Earth Mover’s Distance (EMD), also known as Wasserstein distance, quantifies the minimum “work” required to transform one label distribution into another. It provides a meaningful measure of heterogeneity between client and global distributions.

Mathematical Formulation

For discrete label distributions $p_k$ (client) and $p_g$ (global):

\[\text{EMD}(p_k, p_g) = \inf_{\gamma \in \Gamma(p_k, p_g)} \sum_{i,j} \gamma_{ij} \cdot d(i, j)\]

where:

For ordinal classes with unit ground distance:

\[\text{EMD}(p, q) = \sum_{i=1}^{C-1} |F_p(i) - F_q(i)|\]

where $F_p$ is the cumulative distribution function of $p$.

Implementation Reference

The implementation is located at src/unbitrium/metrics/distribution.py.


Invariants

Invariant 1: Non-negativity

\[\text{EMD}(p, q) \geq 0\]

Verification: All computed EMD values non-negative.

Invariant 2: Identity of Indiscernibles

\[\text{EMD}(p, q) = 0 \iff p = q\]

Verification: Identical distributions yield zero EMD.

Invariant 3: Symmetry

\[\text{EMD}(p, q) = \text{EMD}(q, p)\]

Verification: Order-independent computation.

Invariant 4: Triangle Inequality

\[\text{EMD}(p, r) \leq \text{EMD}(p, q) + \text{EMD}(q, r)\]

Verification: Metric space property holds.

Invariant 5: Bounded Range

\[\text{EMD}(p, q) \leq D_{max}\]

where $D_{max}$ depends on ground metric diameter.


Test Distributions

Distribution 1: Identical Distributions

Input: $p = q = \text{Uniform}(C)$

Expected Output: EMD = 0

Distribution 2: Dirac Delta Shift

Input:

Expected Output: EMD = 1 (with unit ground distance)

Distribution 3: Uniform vs Concentrated

Input:

Expected Output: EMD = 0.45 (approximately)

Distribution 4: Gradual Shift

Input: Sequence of distributions with increasing divergence

Expected Behavior: EMD increases monotonically


Expected Behavior

EMD Interpretation Guide

EMD Range Heterogeneity FL Impact
0.0 - 0.1 Minimal None
0.1 - 0.3 Low Minor slowdown
0.3 - 0.5 Moderate Noticeable degradation
0.5 - 0.7 High Significant issues
0.7 - 1.0 Severe Major convergence problems

Correlation with Accuracy

Empirical findings (CIFAR-10, CNN):


Edge Cases

Edge Case 1: Single Class

Input: $C = 1$

Expected Behavior:

Edge Case 2: Zero Probability Classes

Input: Classes with zero probability in one distribution

Expected Behavior:

Edge Case 3: Large Class Count

Input: $C = 1000$

Expected Behavior:


Reproducibility

Usage Example

from unbitrium.metrics import EMDLabelDistance

metric = EMDLabelDistance()

# Compute for single client
client_labels = [0, 0, 1, 1, 2]
global_labels = [0, 1, 2, 3, 4]  # For reference distribution

emd = metric.compute(client_labels, global_labels)
print(f"EMD: {emd:.4f}")

Batch Computation

# Compute for all clients
emd_values = []
for client_data in partitions:
    client_labels = [y for x, y in client_data]
    emd = metric.compute(client_labels, global_labels)
    emd_values.append(emd)

avg_emd = np.mean(emd_values)

Security Considerations

Information Content

EMD reveals:

Mitigations

  1. Report aggregate EMD only
  2. Add noise for differential privacy

Complexity Analysis

Time Complexity

Using sorting-based algorithm for 1D:

\[T = O(C \log C)\]

Using network flow for general ground metric:

\[T = O(C^3 \log C)\]

Space Complexity

\[S = O(C)\]

Alternative Formulations

Sliced Wasserstein

For high-dimensional features, use sliced approximation:

\[\text{SW}(p, q) = \mathbb{E}_\theta[\text{EMD}(P_\theta p, P_\theta q)]\]

Sinkhorn Distance

Entropy-regularized approximation for faster computation.


References

  1. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99-121.

  2. Villani, C. (2008). Optimal Transport: Old and New. Springer.

  3. Yurochkin, M., et al. (2019). Bayesian nonparametric federated learning of neural networks. In ICML.


Changelog

Version Date Changes
1.0.0 2026-01-04 Initial validation report

Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.