まとめ
- RMSLE applies RMSE to log-transformed predictions, emphasising percentage-like differences.
- Useful when underestimation is costlier than overestimation, such as demand or growth forecasting.
- Learn the precautions around zeros and negative values before applying it.
1. Definition #
$$ \mathrm{RMSLE} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left( \log(1 + \hat{y}_i) - \log(1 + y_i) \right)^2 } $$
- Adding 1 allows zero observations; negative targets remain invalid.
- Errors with the same ratio contribute equally (e.g. predicting 10 vs 20 or 100 vs 200).
2. Computing in Python #
from sklearn.metrics import mean_squared_log_error
rmsle = mean_squared_log_error(y_test, y_pred, squared=False)
print(f"RMSLE = {rmsle:.3f}")
mean_squared_log_error raises an error if inputs contain negatives. Clean or shift data beforehand.
3. When to use RMSLE #
- Demand/sales forecasting: protects against underestimation that could cause stock-outs.
- Growth metrics (traffic, population, installs): relative changes matter more than absolute differences.
- Positive targets: only apply when all values are non-negative.
4. Intuition and caveats #
- Captures percentage error feel while remaining in the same units (after exponentiation).
- Penalises under-forecasting slightly more than over-forecasting due to logarithmic asymmetry.
- Small values still need care; consider adding a minimum threshold before training.
5. Comparison with other metrics #
| Metric | Focus | Pitfalls |
|---|---|---|
| RMSE | Absolute squared error | Dominated by large-scale values |
| MAE | Robust absolute error | Ignores relative perspective |
| RMSLE | Relative/ratio error | Undefined for negatives |
Summary #
- RMSLE blends RMSE with log scaling to emphasise relative accuracy and under-forecast penalties.
- Easy to compute, but requires strict non-negativity checks.
- Pair it with MAE or RMSE to balance relative and absolute error reporting.