Hamming Loss

中級

4.3.13

Hamming Loss

Last updated 2020-07-15 Read time 2 min
Summary
  • Understand the fundamentals of this metric, what it evaluates, and how to interpret the results.
  • Compute and visualise the metric with Python 3.13 code examples, covering key steps and practical checkpoints.
  • Combine charts and complementary metrics for effective model comparison and threshold tuning.
  • Accuracy — understanding this concept first will make learning smoother

1. Definition #

Given true label sets \(Y_i\), predicted label sets \(\hat{Y}_i\), and \(L\) labels in total,

$$ \mathrm{Hamming\ Loss} = \frac{1}{nL} \sum_{i=1}^n \lvert Y_i \triangle \hat{Y}_i \rvert $$

where \(Y \triangle \hat{Y}\) denotes the symmetric difference (labels that appear in only one of the sets). For multi-label tasks this equals the average number of wrong labels per sample.


2. Computing in Python 3.13 #

1
2
python --version  # e.g. Python 3.13.0
pip install scikit-learn
1
2
3
from sklearn.metrics import hamming_loss

print("Hamming Loss:", hamming_loss(y_true, y_pred))

Pass y_true and y_pred as 0/1 multi-label indicator arrays (the output of MultiLabelBinarizer works nicely).


3. Reading the score #

  • The closer to 0, the better. A perfect classifier yields 0.
  • A value of 0.05 means “on average 5% of the labels were wrong”.
  • If labels have different business impact, consider a weighted Hamming Loss.

4. Relation to other metrics #

MetricWhat it capturesWhen to use it
Exact MatchSample-wise perfect accuracyStrict: requires the entire label set match
Hamming LossLabel-wise error rateTrack the average number of mistakes
Micro F1Precision & recall balanceAccount for positive/negative imbalance
Jaccard IndexSet overlapEvaluate the similarity of label sets

Hamming Loss is less strict than Exact Match and provides a smoother signal when iterating on the model.


5. Practical tips #

  • Tag recommendation: quantify how many tags per item are wrong on average.
  • Alert systems: monitor how often multi-label alarms fire incorrectly.
  • Weighted evaluation: apply per-label weights when the cost of mistakes varies across labels.

Takeaways #

  • Hamming Loss captures label-wise error rates, making it ideal for monitoring multi-label improvements.
  • scikit-learn’s hamming_loss is easy to use and complements Exact Match and F1 for a fuller picture.
  • Combine the metric with per-label diagnostics to prioritise remediation where mistakes hurt the most.