まとめ
- MAP is a ranking metric that averages the Average Precision (AP) across multiple queries.
- Using a search example, we compute AP and MAP to observe how ranking positions affect the score.
- We also cover practical issues like long candidate lists and weighting methods.
1. Definition of AP and MAP #
For a single query, the Average Precision (AP) averages the precision at each rank where a correct item is found.
$$ \mathrm{AP} = \frac{1}{|G|} \sum_{k \in G} P(k) $$
Here, \(G\) is the set of ranks of relevant items, and \(P(k)\) is the precision up to position \(k\).
MAP is then the mean of AP values across all queries.
$$ \mathrm{MAP} = \frac{1}{Q} \sum_{q=1}^Q \mathrm{AP}_q $$
2. Computing in Python #
Although Scikit-learn doesn’t provide a direct MAP function, we can compute average_precision_score per query and then take the mean.
from sklearn.metrics import average_precision_score
import numpy as np
aps = []
for q in queries:
aps.append(average_precision_score(y_true[q], y_score[q]))
map_score = np.mean(aps)
print("MAP:", round(map_score, 4))
y_true[q] represents binary labels (0/1) for the query, and y_score[q] represents model output scores.
3. Characteristics and Advantages #
- Works well for rankings with multiple relevant items.
- Rewards systems that find correct items earlier in the list.
- Reflects both precision and recall, offering a more comprehensive evaluation than simple Precision@k.
4. Practical Applications #
- Search systems: Evaluate how well the results cover all relevant items.
- Recommendation systems: Ideal when multiple relevant outputs exist (e.g., products viewed or purchased).
- Learning to Rank (LTR): Commonly used for offline evaluation in ranking models like LambdaMART or XGBoost.
5. Points to Note #
- If the number of relevant items per query varies greatly, MAP can be biased — consider Weighted MAP.
- Queries with no relevant items (only 0s) yield undefined AP; decide whether to exclude or assign zero.
- Use together with NDCG, Recall@k, and other metrics for a holistic ranking evaluation.
Summary #
- MAP is the average of average precisions — ideal for rankings with multiple correct answers.
- It’s simple to compute by averaging AP values per query.
- Combine with NDCG or Recall@k to improve ranking system performance from multiple perspectives.