まとめ
- Average Precision (AP) | Evaluating Precision–Recall curvesの概要を押さえ、評価対象と読み取り方を整理します。
- Python 3.13 のコード例で算出・可視化し、手順と実務での確認ポイントを確認します。
- 図表や補助指標を組み合わせ、モデル比較や閾値調整に活かすヒントをまとめます。
1. Definition #
If the PR curve consists of points \((R_n, P_n)\), Average Precision is defined as \mathrm{AP} = \sum_{n}(R_n - R_{n-1}) P_n The change in recall acts as the weight, so AP reflects the average precision as the threshold slides from high to low.
2. Computing AP in Python 3.13 #
python --version # e.g. Python 3.13.0
pip install scikit-learn matplotlib
Reusing the probabilities proba from the Precision–Recall example, we can obtain AP with a couple of scikit-learn calls:
from sklearn.metrics import precision_recall_curve, average_precision_score
precision, recall, thresholds = precision_recall_curve(y_test, proba)
ap = average_precision_score(y_test, proba)
print(f"Average Precision: {ap:.3f}")
The corresponding PR curve is the same pr_curve.png we generated earlier. AP integrates the PR curve by weighting precision with recall increments.
3. AP versus PR-AUC #
- verage_precision_score implements the step-wise integration commonly used in information retrieval.
- sklearn.metrics.auc(recall, precision) applies the trapezoidal rule and yields the classical PR-AUC.
- AP tends to be more robust on imbalanced datasets because it emphasises changes where recall actually increases.
4. Practical takeaways #
- Threshold tuning – Higher AP implies that the model keeps precision high across a wider span of recalls.
- Ranking tasks – In recommendation and search, Mean Average Precision (MAP) averages AP across queries.
- Complement to F1 – F1 reflects a single operating point; AP reflects the whole threshold spectrum.
Summary #
- Average Precision evaluates the entire Precision–Recall landscape, making it ideal for imbalanced problems.
- Python 3.13 + scikit-learn compute it with verage_precision_score in a few lines.
- Pair AP with F1, ROC-AUC, and PR curves when presenting model comparisons or choosing operating thresholds.