まとめ
- Recall@k measures the proportion of relevant items included within the top-k results.
- We compute Recall@k and Precision@k from a recommendation list to evaluate its coverage and purity.
- Interpretation varies depending on the number of ground truths and candidate items; we review key design considerations.
1. Definition #
For a query \(q\) with a set of relevant items \(G_q\) and a top-k candidate set \(S_{q,k}\):
$$ \mathrm{Recall@k} = \frac{|G_q \cap S_{q,k}|}{|G_q|} $$ $$ \mathrm{Precision@k} = \frac{|G_q \cap S_{q,k}|}{k} $$
- Recall@k: How many of the relevant items were retrieved.
- Precision@k: How many of the retrieved items were actually relevant.
2. Python Implementation #
import numpy as np
def recall_at_k(y_true: np.ndarray, y_score: np.ndarray, k: int) -> float:
"""Compute Recall@k — proportion of true positives within top-k."""
idx = np.argsort(-y_score)[:k]
return float(y_true[idx].sum() / y_true.sum())
def precision_at_k(y_true: np.ndarray, y_score: np.ndarray, k: int) -> float:
"""Compute Precision@k — fraction of top-k items that are relevant."""
idx = np.argsort(-y_score)[:k]
return float(y_true[idx].sum() / k)
y_true contains binary relevance labels (0/1), and y_score contains model-predicted scores.
For multiple queries, compute the mean over all samples.
3. Choosing k #
- Set k according to UI or serving constraints (e.g., top-5 recommendations → Recall@5).
- Evaluate multiple cutoffs (Recall@5, Recall@10, etc.) to analyze coverage trends.
4. Practical Applications #
- Recommendation systems: Check whether the items a user actually selects appear in the recommendation list.
- Advertising: Measure how many clicked or converted ads were included in the top-k impressions.
- A/B testing: Track Recall@k alongside online metrics to confirm whether offline improvements translate to user behavior.
5. Trade-off with Precision@k #
- Increasing Recall@k typically lowers Precision@k, as more items are retrieved.
- The ideal model maximizes both under a fixed k.
- Use F1@k or MAP to balance recall and precision when evaluating ranking models.
Summary #
- Recall@k measures coverage, while Precision@k measures purity.
- Define k clearly and evaluate multiple metrics to capture ranking quality comprehensively.
- Relating Recall@k and Precision@k to online KPIs (e.g., CTR, CVR) helps quantify business impact.