Regression

Eval

Regression

まとめ
  • Get a full picture of regression metrics and how to choose among them.
  • Compare error metrics, determination metrics, and interval/probabilistic metrics with code examples.
  • Summarise how to assemble a metric set that matches business needs and data characteristics.

Chapter 2 #

Regression metrics overview #

Regression tasks offer many ways to quantify “how close” predictions are to the truth. Each metric emphasises a different perspective—absolute error, squared error, relative error, probabilistic guarantees—so the right choice depends on the use case. This chapter positions the representative metrics and highlights when to use them.


Metric categories #

Error-based metrics #

  • Mean Absolute Error (MAE): robust to outliers and expressed in the same unit as the target.
  • Root Mean Squared Error (RMSE): penalises large errors heavily; useful when big misses must be avoided.
  • Mean Absolute Percentage Error (MAPE) / Weighted Absolute Percentage Error (WAPE): helpful when stakeholders think in percentages (e.g., demand forecasting).
  • Mean Absolute Scaled Error (MASE): compares against a naïve seasonal baseline.
  • Root Mean Squared Log Error (RMSLE): suitable when targets are close to zero or growth rates matter.

Determination / variance explanation #

  • Coefficient of determination (R²): baseline measure of variance explained; can be negative.
  • Adjusted R²: compensates for overfitting when features are added.
  • Explained variance: focuses on the variance of the residuals.

Interval and probabilistic metrics #

  • Pinball loss: evaluates quantile regression / prediction intervals.
  • PICP (Prediction Interval Coverage Probability): measures how often prediction intervals capture the true value.
  • PINAW (Prediction Interval Normalised Average Width): evaluates how tight the intervals are.

Workflow for choosing metrics #

  1. Define business objectives
    Decide whether absolute error, percentage error, or asymmetric penalties matter more.
  2. Assess data characteristics
    Check for values crossing zero, heavy tails/outliers, or strong seasonality.
  3. Establish baselines
    Compare against naïve forecasts, simple regressions, or the mean to contextualise improvements.
  4. Evaluate with complementary metrics
    Pair MAE with RMSE, or R² with Adjusted R², to cover different viewpoints.

Example: comparing metrics side by side #

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
n = 200
y_true = rng.normal(loc=100, scale=15, size=n)
noise = rng.normal(scale=10, size=n)

baseline = y_true.mean() + rng.normal(scale=12, size=n)
model = y_true + noise
robust_model = y_true + np.clip(noise, -8, 8)

def mae(y, y_hat):
    return np.mean(np.abs(y - y_hat))

def rmse(y, y_hat):
    return np.sqrt(np.mean((y - y_hat) ** 2))

def mape(y, y_hat):
    return np.mean(np.abs((y - y_hat) / y)) * 100

scores = {
    "MAE": [mae(y_true, baseline), mae(y_true, model), mae(y_true, robust_model)],
    "RMSE": [rmse(y_true, baseline), rmse(y_true, model), rmse(y_true, robust_model)],
    "MAPE (%)": [mape(y_true, baseline), mape(y_true, model), mape(y_true, robust_model)],
}

labels = ["Baseline", "Model", "Robust"]
x = np.arange(len(labels))
width = 0.25

fig, ax = plt.subplots(figsize=(7, 4.5))
for idx, (metric, values) in enumerate(scores.items()):
    ax.bar(x + idx * width, values, width=width, label=metric)

ax.set_xticks(x + width)
ax.set_xticklabels(labels)
ax.set_ylabel("Score")
ax.set_title("Comparison of regression metrics across models")
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
Comparing regression metrics

Different metrics favour different models. MAE highlights the robust model, while RMSE penalises the baseline that suffers from outliers. Use complementary metrics to ensure the chosen model matches the intended behaviour.


Metric quick reference #

CategoryMetricPrimary useCaveats
ErrorMAEOutlier-resistant, interpretable unitsNot suited when relative error is critical
ErrorRMSEEmphasises large errorsSensitive to outliers
ErrorRMSLEHandles wide ranges / growth ratesUndefined for targets ≤ 0
ErrorMAPE / WAPEPercentage-based reportingBreaks when targets hit zero; penalises underestimation
ErrorMASECompare against naïve seasonal forecastsRequires proper seasonal period
ErrorPinball lossQuantile / interval evaluationNeeds per-quantile optimisation
DeterminationVariance explanationCan be negative
DeterminationAdjusted R²Comparing models with different feature countsUnstable with tiny sample sizes
DeterminationExplained varianceFocus on residual varianceScale-dependent; ignores bias
IntervalPICPCoverage of prediction intervalsInspect interval width simultaneously
IntervalPINAWTightness of intervalsInterpret alongside coverage

Operational checklist #

  • Before deployment: verify metrics on the full history and inspect residual plots to catch anomalies.
  • Monitoring: track metric drift (e.g., MAE, RMSE) and define alert thresholds.
  • Visual diagnostics: combine prediction vs. actual charts, residual histograms, and quantile plots.
  • Stakeholder communication: translate metrics into business terms (e.g., “average error of ±¥X”) to drive decisions.