まとめ
- Top-k Accuracy and Recall@kの概要を押さえ、評価対象と読み取り方を整理します。
- Python 3.13 のコード例で算出・可視化し、手順と実務での確認ポイントを確認します。
- 図表や補助指標を組み合わせ、モデル比較や閾値調整に活かすヒントをまとめます。
1. Definition #
If the model produces scores per class and \(S_k(x)\) denotes the top-k candidates for input \(x\), then
$$
\mathrm{Top\text{-}k\ Accuracy} = \frac{1}{n} \sum_{i=1}^n \mathbf{1}{ y_i \in S_k(x_i) }
$$
This is equivalent to Recall@k: a hit is recorded when the ground-truth label is among the top k predictions.
2. Computing in Python 3.13 #
python --version # e.g. Python 3.13.0
pip install scikit-learn
import numpy as np
from sklearn.metrics import top_k_accuracy_score
proba = model.predict_proba(X_test) # shape: (n_samples, n_classes)
top3 = top_k_accuracy_score(y_test, proba, k=3, labels=model.classes_)
top5 = top_k_accuracy_score(y_test, proba, k=5, labels=model.classes_)
print("Top-3 Accuracy:", round(top3, 3))
print("Top-5 Accuracy:", round(top5, 3))
top_k_accuracy_score consumes class probabilities and returns the metric for any value of k. Passing the labels array makes the computation robust to class-order differences.
3. Design considerations #
- Choosing k: Match the shortlists your UI or product can display.
- Ties: Equal scores can make rankings unstable; use stable sorting or add tie-breaker decimals when needed.
- Multiple ground truths: For genuine multi-label tasks, complement Top-k Accuracy with Precision@k and Recall@k.
4. Practical applications #
- Recommendation systems: Check whether the items users eventually choose appear in the suggested slate.
- Image classification: Large-class datasets (e.g., ImageNet) traditionally report Top-5 Accuracy.
- Search & retrieval: Evaluate if the relevant document shows up within the top 10 results.
5. Comparison with other ranking metrics #
| Metric | What it captures | Typical use case |
|---|---|---|
| Top-k Accuracy | Whether the correct label is in k | Shortlists where any hit counts |
| NDCG | Rank-weighted relevance | When placement near the top carries more value |
| MAP | Mean precision over ranks | Rankings with multiple relevant items |
| Hit Rate | Synonymous with Top-k Accuracy | Common in recommender-system literature |
Takeaways #
- Top-k Accuracy checks if the correct answer appears among the candidates—a simple yet powerful signal.
- scikit-learn’s
top_k_accuracy_scorelets you evaluate multiple k values with minimal effort. - Combine it with NDCG or MAP to account for rank quality and cases with several relevant options.