Average Precision

Eval

Average Precision (AP) | Evaluating Precision–Recall curves

Created: Last updated: Read time: 2 min
まとめ
  • Average Precision (AP) | Evaluating Precision–Recall curvesの概要を押さえ、評価対象と読み取り方を整理します。
  • Python 3.13 のコード例で算出・可視化し、手順と実務での確認ポイントを確認します。
  • 図表や補助指標を組み合わせ、モデル比較や閾値調整に活かすヒントをまとめます。

1. Definition #

If the PR curve consists of points \((R_n, P_n)\), Average Precision is defined as \mathrm{AP} = \sum_{n}(R_n - R_{n-1}) P_n The change in recall acts as the weight, so AP reflects the average precision as the threshold slides from high to low.


2. Computing AP in Python 3.13 #

python --version        # e.g. Python 3.13.0
pip install scikit-learn matplotlib

Reusing the probabilities proba from the Precision–Recall example, we can obtain AP with a couple of scikit-learn calls:

from sklearn.metrics import precision_recall_curve, average_precision_score
precision, recall, thresholds = precision_recall_curve(y_test, proba)
ap = average_precision_score(y_test, proba)
print(f"Average Precision: {ap:.3f}")

The corresponding PR curve is the same pr_curve.png we generated earlier.

Precision–Recall curve

AP integrates the PR curve by weighting precision with recall increments.


3. AP versus PR-AUC #

  • verage_precision_score implements the step-wise integration commonly used in information retrieval.
  • sklearn.metrics.auc(recall, precision) applies the trapezoidal rule and yields the classical PR-AUC.
  • AP tends to be more robust on imbalanced datasets because it emphasises changes where recall actually increases.

4. Practical takeaways #

  • Threshold tuning – Higher AP implies that the model keeps precision high across a wider span of recalls.
  • Ranking tasks – In recommendation and search, Mean Average Precision (MAP) averages AP across queries.
  • Complement to F1 – F1 reflects a single operating point; AP reflects the whole threshold spectrum.

Summary #

  • Average Precision evaluates the entire Precision–Recall landscape, making it ideal for imbalanced problems.
  • Python 3.13 + scikit-learn compute it with verage_precision_score in a few lines.
  • Pair AP with F1, ROC-AUC, and PR curves when presenting model comparisons or choosing operating thresholds.