NDCG (Normalized Discounted Cumulative Gain)

Created: 2019-02-02 Last updated: 2020-01-29 Read time: 2 min

まとめ

NDCG is a ranking metric that evaluates results by normalizing the discounted cumulative gain.
It uses relevance scores to compute DCG/NDCG and shows how logarithmic discounting works.
We also review considerations for multi-level relevance and cascade models.

1. Definition #

Given a relevance score $rel_i$ for rank $i$, the Discounted Cumulative Gain (DCG) is defined as:

$$ \mathrm{DCG@k} = \sum_{i=1}^{k} \frac{2^{rel_i} - 1}{\log_2(i + 1)} $$

To normalize this, we compute the DCG for the ideal ranking order (IDCG) and obtain:

$$ \mathrm{NDCG@k} = \frac{\mathrm{DCG@k}}{\mathrm{IDCG@k}} $$

2. Computing in Python #

from sklearn.metrics import ndcg_score

# y_true: array of true relevance scores with shape (n_samples, n_labels)
# y_score: model output scores

score = ndcg_score(y_true, y_score, k=10)

print("NDCG@10:", round(score, 4))

ndcg_score accepts not only binary relevance (0/1) but also graded integer scores. The key is to properly define your ground-truth relevance matrix.

3. Hyperparameters #

k (cutoff): Choose @5, @10, etc., based on how many results are shown to users.
Relevance scale: Binary scores (0/1) work, but graded relevance levels (e.g., highly relevant, somewhat relevant) yield more nuanced evaluation.
Log base: While log base 2 is standard, using a different base only changes scaling, not the relative ranking.

4. Practical Applications #

Search evaluation: Commonly used to measure how well search results align with human-annotated relevance labels.
Recommendation systems: Treat implicit feedback (views, clicks, purchases) as relevance signals to track ranking improvements.
A/B testing: Combine NDCG with online metrics to understand how offline improvements translate to real-world performance.

5. Key Considerations #

Ground-truth labeling is expensive; implicit signals can be noisy.
In two-stage systems (candidate generation → ranking), use suitable metrics for each phase.
Combine NDCG with Recall@k, MAP, or other metrics to capture a holistic view of user experience.

Summary #

NDCG measures how highly relevant items are positioned near the top, using a logarithmic discount.
It’s easy to compute using ndcg_score, but the choice of k and relevance scale greatly affects interpretation.
Use NDCG alongside other ranking metrics to comprehensively assess ranking quality.