Earth Mover’s Distance (EMD), also known as Wasserstein distance, quantifies the minimum “work” required to transform one label distribution into another. It provides a meaningful measure of heterogeneity between client and global distributions.
For discrete label distributions $p_k$ (client) and $p_g$ (global):
\[\text{EMD}(p_k, p_g) = \inf_{\gamma \in \Gamma(p_k, p_g)} \sum_{i,j} \gamma_{ij} \cdot d(i, j)\]where:
For ordinal classes with unit ground distance:
\[\text{EMD}(p, q) = \sum_{i=1}^{C-1} |F_p(i) - F_q(i)|\]where $F_p$ is the cumulative distribution function of $p$.
The implementation is located at src/unbitrium/metrics/distribution.py.
Verification: All computed EMD values non-negative.
Verification: Identical distributions yield zero EMD.
Verification: Order-independent computation.
Verification: Metric space property holds.
where $D_{max}$ depends on ground metric diameter.
Input: $p = q = \text{Uniform}(C)$
Expected Output: EMD = 0
Input:
Expected Output: EMD = 1 (with unit ground distance)
Input:
Expected Output: EMD = 0.45 (approximately)
Input: Sequence of distributions with increasing divergence
Expected Behavior: EMD increases monotonically
| EMD Range | Heterogeneity | FL Impact |
|---|---|---|
| 0.0 - 0.1 | Minimal | None |
| 0.1 - 0.3 | Low | Minor slowdown |
| 0.3 - 0.5 | Moderate | Noticeable degradation |
| 0.5 - 0.7 | High | Significant issues |
| 0.7 - 1.0 | Severe | Major convergence problems |
Empirical findings (CIFAR-10, CNN):
Input: $C = 1$
Expected Behavior:
Input: Classes with zero probability in one distribution
Expected Behavior:
Input: $C = 1000$
Expected Behavior:
from unbitrium.metrics import EMDLabelDistance
metric = EMDLabelDistance()
# Compute for single client
client_labels = [0, 0, 1, 1, 2]
global_labels = [0, 1, 2, 3, 4] # For reference distribution
emd = metric.compute(client_labels, global_labels)
print(f"EMD: {emd:.4f}")
# Compute for all clients
emd_values = []
for client_data in partitions:
client_labels = [y for x, y in client_data]
emd = metric.compute(client_labels, global_labels)
emd_values.append(emd)
avg_emd = np.mean(emd_values)
EMD reveals:
Using sorting-based algorithm for 1D:
\[T = O(C \log C)\]Using network flow for general ground metric:
\[T = O(C^3 \log C)\]For high-dimensional features, use sliced approximation:
\[\text{SW}(p, q) = \mathbb{E}_\theta[\text{EMD}(P_\theta p, P_\theta q)]\]Entropy-regularized approximation for faster computation.
Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99-121.
Villani, C. (2008). Optimal Transport: Old and New. Springer.
Yurochkin, M., et al. (2019). Bayesian nonparametric federated learning of neural networks. In ICML.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.