Hellinger Distance

Eval

Hellinger Distance

まとめ
  • The Hellinger distance measures the difference between probability distributions based on their square roots, ranging from 0 to 1.
  • We show the calculation for discrete distributions and compare it to KL divergence.
  • Practical notes on handling zeros and normalization are discussed.

1. Definition and Properties #

For discrete distributions \(P = (p_1, \dots, p_n)\) and \(Q = (q_1, \dots, q_n)\), the Hellinger distance is defined as:

$$ H(P, Q) = \frac{1}{\sqrt{2}} \sqrt{ \sum_{i=1}^n \left(\sqrt{p_i} - \sqrt{q_i}\right)^2 } $$

  • Values near 0 indicate similar distributions; values near 1 indicate dissimilar ones.
  • It satisfies symmetry and the triangle inequality, making it a true metric.
  • Unlike KL divergence, it remains finite even when supports do not overlap.

2. Computing in Python #

import numpy as np

def hellinger(p: np.ndarray, q: np.ndarray) -> float:
    """Compute the Hellinger distance between two probability distributions."""
    p = np.asarray(p, dtype=float)
    q = np.asarray(q, dtype=float)

    # Normalize to avoid division errors
    p = p / p.sum()
    q = q / q.sum()

    return float(np.linalg.norm(np.sqrt(p) - np.sqrt(q)) / np.sqrt(2))

p = np.array([0.4, 0.4, 0.2])
q = np.array([0.2, 0.5, 0.3])

print(f"Hellinger distance: {hellinger(p, q):.4f}")

When using histograms, normalize so that values sum to 1 before computing the square-root difference.


3. Key Characteristics #

  • Symmetric and metric: Like Jensen–Shannon distance, it satisfies symmetry and triangle inequality.
  • Finite with zeros: Always yields a finite value even when distributions contain zeros.
  • Square-root scaling: Reduces overemphasis on small probabilities, improving stability with outliers.

4. Practical Applications #

  • Bayesian inference: Quantify the difference between prior and posterior distributions.
  • Clustering: Measure distances between probabilistic vectors (e.g., topic distributions).
  • Concept drift detection: Monitor time-evolving distributions and raise alerts when exceeding thresholds.

5. Practical Considerations #

  • The distance depends on histogram binning; design the discretization carefully.
  • Apply Laplace smoothing when sample sizes are small to stabilize results.
  • Combine with other metrics (KL, Wasserstein, etc.) for a multidimensional understanding of distributional differences.

Summary #

The Hellinger distance measures distributional differences using square-root scaling, producing intuitive values between 0 and 1.
It is robust to zeros and widely used in Bayesian analysis, clustering, and distribution monitoring tasks.