Normalized Mutual Information (NMI) between representation clusters quantifies the alignment of learned features across clients. High NMI indicates consistent representations; low NMI suggests feature-space drift.
where:
Mutual information:
\[I(U; V) = \sum_{u, v} p(u, v) \log \frac{p(u, v)}{p(u) p(v)}\]The implementation is located at src/unbitrium/metrics/representation.py.
Verification: All computed NMI values in $[0, 1]$.
Verification: Self-comparison yields 1.
Verification: Order-independent.
Verification: Independent clusterings yield zero (in expectation).
Input: Same cluster assignments
Expected Output: NMI = 1.0
Input: Independent random assignments
Expected Output: NMI $\approx 0$ (close to zero)
Input: One clustering is refinement of another
Expected Behavior: NMI < 1 but positive
Input: Same clusters but relabeled
Expected Output: NMI = 1.0 (label-invariant)
Compare client representations to global model:
| Heterogeneity | Expected NMI |
|---|---|
| IID | 0.8 - 1.0 |
| Low non-IID | 0.6 - 0.8 |
| Moderate | 0.4 - 0.6 |
| High | 0.2 - 0.4 |
| Extreme | < 0.2 |
Input: All samples in one cluster
Expected Behavior:
Input: One-to-one cluster mapping
Expected Output: NMI = 1.0
| Input: $ | U | \neq | V | $ |
Expected Behavior:
from unbitrium.metrics import NMIRepresentations
from sklearn.cluster import KMeans
metric = NMIRepresentations(n_clusters=10)
# Get representations
global_reps = global_model.encode(data)
client_reps = client_model.encode(data)
# Cluster and compute NMI
nmi = metric.compute(global_reps, client_reps)
print(f"NMI: {nmi:.4f}")
metric = NMIRepresentations(
n_clusters=10,
clustering_method="kmeans",
random_state=42,
)
NMI reveals:
Clustering: $O(N \cdot K \cdot d \cdot I)$
NMI computation: $O(N \cdot C_U \cdot C_V)$
Total: Dominated by clustering.
Corrected for chance:
\[\text{AMI}(U, V) = \frac{I(U; V) - \mathbb{E}[I]}{\max(H(U), H(V)) - \mathbb{E}[I]}\]Kernel-based similarity without clustering:
\[\text{CKA}(X, Y) = \frac{\text{HSIC}(X, Y)}{\sqrt{\text{HSIC}(X, X) \cdot \text{HSIC}(Y, Y)}}\]Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. JMLR, 11, 2837-2854.
Kornblith, S., et al. (2019). Similarity of neural network representations revisited. In ICML.
Nguyen, T., et al. (2020). Wide neural networks of any depth evolve as linear models under gradient descent. In JMLR.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.