RMSLE (Root Mean Squared Log Error)

Eval

RMSLE (Root Mean Squared Log Error)

まとめ
  • RMSLE applies RMSE to log-transformed predictions, emphasising percentage-like differences.
  • Useful when underestimation is costlier than overestimation, such as demand or growth forecasting.
  • Learn the precautions around zeros and negative values before applying it.

1. Definition #

$$ \mathrm{RMSLE} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left( \log(1 + \hat{y}_i) - \log(1 + y_i) \right)^2 } $$

  • Adding 1 allows zero observations; negative targets remain invalid.
  • Errors with the same ratio contribute equally (e.g. predicting 10 vs 20 or 100 vs 200).

2. Computing in Python #

from sklearn.metrics import mean_squared_log_error

rmsle = mean_squared_log_error(y_test, y_pred, squared=False)
print(f"RMSLE = {rmsle:.3f}")

mean_squared_log_error raises an error if inputs contain negatives. Clean or shift data beforehand.


3. When to use RMSLE #

  • Demand/sales forecasting: protects against underestimation that could cause stock-outs.
  • Growth metrics (traffic, population, installs): relative changes matter more than absolute differences.
  • Positive targets: only apply when all values are non-negative.

4. Intuition and caveats #

  • Captures percentage error feel while remaining in the same units (after exponentiation).
  • Penalises under-forecasting slightly more than over-forecasting due to logarithmic asymmetry.
  • Small values still need care; consider adding a minimum threshold before training.

5. Comparison with other metrics #

MetricFocusPitfalls
RMSEAbsolute squared errorDominated by large-scale values
MAERobust absolute errorIgnores relative perspective
RMSLERelative/ratio errorUndefined for negatives

Summary #

  • RMSLE blends RMSE with log scaling to emphasise relative accuracy and under-forecast penalties.
  • Easy to compute, but requires strict non-negativity checks.
  • Pair it with MAE or RMSE to balance relative and absolute error reporting.