Top-k Accuracy

Eval

Top-k Accuracy and Recall@k

Created: Last updated: Read time: 2 min
まとめ
  • Top-k Accuracy and Recall@kの概要を押さえ、評価対象と読み取り方を整理します。
  • Python 3.13 のコード例で算出・可視化し、手順と実務での確認ポイントを確認します。
  • 図表や補助指標を組み合わせ、モデル比較や閾値調整に活かすヒントをまとめます。

1. Definition #

If the model produces scores per class and \(S_k(x)\) denotes the top-k candidates for input \(x\), then $$ \mathrm{Top\text{-}k\ Accuracy} = \frac{1}{n} \sum_{i=1}^n \mathbf{1}{ y_i \in S_k(x_i) } $$ This is equivalent to Recall@k: a hit is recorded when the ground-truth label is among the top k predictions.


2. Computing in Python 3.13 #

python --version  # e.g. Python 3.13.0
pip install scikit-learn
import numpy as np
from sklearn.metrics import top_k_accuracy_score

proba = model.predict_proba(X_test)  # shape: (n_samples, n_classes)
top3 = top_k_accuracy_score(y_test, proba, k=3, labels=model.classes_)
top5 = top_k_accuracy_score(y_test, proba, k=5, labels=model.classes_)

print("Top-3 Accuracy:", round(top3, 3))
print("Top-5 Accuracy:", round(top5, 3))

top_k_accuracy_score consumes class probabilities and returns the metric for any value of k. Passing the labels array makes the computation robust to class-order differences.


3. Design considerations #

  • Choosing k: Match the shortlists your UI or product can display.
  • Ties: Equal scores can make rankings unstable; use stable sorting or add tie-breaker decimals when needed.
  • Multiple ground truths: For genuine multi-label tasks, complement Top-k Accuracy with Precision@k and Recall@k.

4. Practical applications #

  • Recommendation systems: Check whether the items users eventually choose appear in the suggested slate.
  • Image classification: Large-class datasets (e.g., ImageNet) traditionally report Top-5 Accuracy.
  • Search & retrieval: Evaluate if the relevant document shows up within the top 10 results.

5. Comparison with other ranking metrics #

MetricWhat it capturesTypical use case
Top-k AccuracyWhether the correct label is in kShortlists where any hit counts
NDCGRank-weighted relevanceWhen placement near the top carries more value
MAPMean precision over ranksRankings with multiple relevant items
Hit RateSynonymous with Top-k AccuracyCommon in recommender-system literature

Takeaways #

  • Top-k Accuracy checks if the correct answer appears among the candidates—a simple yet powerful signal.
  • scikit-learn’s top_k_accuracy_score lets you evaluate multiple k values with minimal effort.
  • Combine it with NDCG or MAP to account for rank quality and cases with several relevant options.