まとめ
- Jensen–Shannon divergence symmetrises KL divergence and keeps the value finite.
- Compute JSD and its square root (the Jensen–Shannon distance) in Python.
- Apply it to clustering, generative-model evaluation, and drift analysis.
1. Definition and properties #
Given two distributions (P) and (Q), let (M = \frac{1}{2}(P + Q)). The Jensen–Shannon divergence is:
$$ \mathrm{JSD}(P \parallel Q) = \frac{1}{2} \mathrm{KL}(P \parallel M) + \frac{1}{2} \mathrm{KL}(Q \parallel M) $$
- Symmetric: ( \mathrm{JSD}(P \parallel Q) = \mathrm{JSD}(Q \parallel P) ).
- Bounded between 0 and 1 (using log base 2).
- The square root of JSD is a proper metric (satisfies the triangle inequality).
2. Python example #
import numpy as np
from scipy.spatial.distance import jensenshannon
p = np.array([0.4, 0.4, 0.2])
q = np.array([0.1, 0.3, 0.6])
js_distance = jensenshannon(p, q, base=2)
js_divergence = js_distance ** 2 # square the distance to obtain divergence
print(f"Jensen-Shannon distance : {js_distance:.4f}")
print(f"Jensen-Shannon divergence: {js_divergence:.4f}")
jensenshannon returns the distance (square root of JSD); square it if you need the divergence.
3. Characteristics and use cases #
- Symmetry and stability: avoids KL’s dependency on direction and finiteness issues when supports differ.
- Bounded: values stay within a predictable range, making thresholding easier.
- Metric: the distance can be used with clustering algorithms that require a metric.
4. Practical examples #
- Generative models: measure divergence between generated and real distributions.
- Language/topic models: compare probability distributions of words or topics.
- Anomaly detection: monitor distribution shifts in time series or streaming data.
- Model selection: pick the candidate whose output distribution best matches ground truth.
5. Caveats #
- For continuous data, discretise via binning or use density estimation before computing JSD.
- Apply smoothing when many zero probabilities are present.
- Small divergence does not guarantee superior model performance; interpret alongside other metrics.
Jensen–Shannon divergence provides a stable, symmetric alternative to KL with convenient metric properties. SciPy makes it easy to compute, enabling broad use in monitoring, evaluation, and clustering tasks.