4
Evaluation
Summary
- Understand the fundamentals of this metric, what it evaluates, and how to interpret the results.
- Compute and visualise the metric with Python 3.13 code examples, covering key steps and practical checkpoints.
- Combine charts and complementary metrics for effective model comparison and threshold tuning.
Metrics #
Quick Reference #
Classification Metrics #
| Metric | Imbalance-safe | Evaluates probability | Threshold-free | Multi-class | Primary use |
|---|---|---|---|---|---|
| Accuracy | ✓ | Balanced-data overview | |||
| Balanced Accuracy | ✓ | ✓ | Imbalanced accuracy | ||
| Precision / Recall / F1 | ✓ | ✓ | Cost-asymmetric tasks | ||
| ROC-AUC | ✓ | ✓ | ✓ | Threshold-free comparison | |
| Average Precision | ✓ | ✓ | ✓ | Rare-positive tasks | |
| Log Loss | ✓ | ✓ | ✓ | Probability calibration | |
| Brier Score | ✓ | ✓ | Calibration (MSE-based) | ||
| MCC | ✓ | Uses all confusion cells | |||
| Cohen’s Kappa | ✓ | Annotator agreement |
Regression Metrics #
| Metric | Scale-free | Outlier-robust | Directional | Primary use |
|---|---|---|---|---|
| MAE | ✓ | Intuitive average error | ||
| RMSE | Penalises large errors | |||
| R² | ✓ | Explained variance | ||
| Adjusted R² | ✓ | Variance adjusted for features | ||
| MAPE | ✓ | Business-friendly % error | ||
| WAPE | ✓ | ✓ | Weighted % error | |
| MASE | ✓ | Time-series comparison | ||
| MBE | ✓ | Bias detection | ||
| Median AE | ✓ | Regression with outliers | ||
| Pinball Loss | ✓ | Quantile forecast evaluation |
Ranking & Distance Metrics #
| Metric | Category | Rank-aware | Primary use |
|---|---|---|---|
| NDCG | Ranking | ✓ | Search & recommendation quality |
| MAP | Ranking | ✓ | Precision-based ranking |
| Recall@k | Ranking | Top-k coverage | |
| Hit Rate | Ranking | Recommendation hit ratio | |
| KL Divergence | Distance | — | Information-theoretic divergence |
| JS Divergence | Distance | — | Symmetric KLD |
| Wasserstein | Distance | — | Geometric distribution distance |
| Cosine Similarity | Distance | — | Vector direction similarity |