Quantity Skew with Power-Law Distribution partitions data such that client dataset sizes follow a power-law (Zipfian) distribution. This simulates realistic scenarios where a few clients have large datasets while many have small ones.
Client $k$ receives $n_k$ samples where:
\[n_k \propto k^{-\gamma}\]Normalized to ensure all samples assigned:
\[n_k = \lfloor N \cdot \frac{k^{-\gamma}}{\sum_{j=1}^K j^{-\gamma}} \rfloor\]where:
The implementation is located at src/unbitrium/partitioning/quantity_skew.py.
All samples assigned (modulo rounding):
\[\sum_{k=1}^K n_k \leq N\]Verification: Residual samples assigned to largest clients.
Sample counts decrease with client index:
\[k_1 < k_2 \implies n_{k_1} \geq n_{k_2}\]Verification: Sorted order maintained.
When $\gamma = 0$, approaches uniform:
\[\gamma = 0 \implies n_k = N/K\]Verification: Zero exponent produces equal sizes.
As $\gamma \to \infty$, first client dominates:
\[\lim_{\gamma \to \infty} n_1 / N = 1\]Verification: Large $\gamma$ concentrates samples.
Configuration:
Expected Behavior:
| Client Rank | Expected Samples |
|---|---|
| 1 | ~12000 |
| 10 | ~1200 |
| 50 | ~240 |
| 100 | ~120 |
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
Configuration:
Expected Behavior:
| $\gamma$ | Distribution Shape | Real-World Analogy |
|---|---|---|
| 0 | Uniform | Controlled experiment |
| 0.5 | Mild skew | Enterprise devices |
| 1.0 | Zipf’s law | Natural language, web |
| 1.5 | Strong skew | Social networks |
| 2.0+ | Extreme | Winner-take-all markets |
| Metric | Range | Notes |
|---|---|---|
gini_coefficient |
$[0, 1]$ | Inequality measure |
max_to_min_ratio |
$[1, \infty)$ | Size ratio |
median_samples |
$(0, N/K)$ | Median client size |
effective_clients |
$(0, K]$ | Clients with >1% of data |
Input: $K = 1$
Expected Behavior:
Input: $K > N$
Expected Behavior:
Input: $\gamma = 0$
Expected Behavior:
Input: $\gamma < 0$
Expected Behavior:
from unbitrium.partitioning import QuantitySkewPowerLaw
partitioner = QuantitySkewPowerLaw(
gamma=1.0,
num_clients=100,
seed=42,
)
Important: Power-law assigns by rank. For random assignment:
partitioner = QuantitySkewPowerLaw(
gamma=1.0,
num_clients=100,
shuffle=True, # Randomize client-size mapping
seed=42,
)
Quantity skew reveals client data sizes:
Breakdown:
Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661-703.
Li, Q., et al. (2022). Federated learning on non-IID data silos: An experimental study. In ICDE.
Luo, M., et al. (2021). Cost-effective federated learning design. In INFOCOM.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-04 | Initial validation report |
Copyright 2026 Olaf Yunus Laitinen Imanov and Contributors. Released under EUPL 1.2.