RMSLE (Root Mean Squared Log Error)

Created: 2019-04-13 Last updated: 2020-03-25 Read time: 2 min

まとめ

RMSLE applies RMSE to log-transformed predictions, emphasising percentage-like differences.
Useful when underestimation is costlier than overestimation, such as demand or growth forecasting.
Learn the precautions around zeros and negative values before applying it.

1. Definition #

$$ \mathrm{RMSLE} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left( \log(1 + \hat{y}_i) - \log(1 + y_i) \right)^2 } $$

Adding 1 allows zero observations; negative targets remain invalid.
Errors with the same ratio contribute equally (e.g. predicting 10 vs 20 or 100 vs 200).

from sklearn.metrics import mean_squared_log_error

rmsle = mean_squared_log_error(y_test, y_pred, squared=False)
print(f"RMSLE = {rmsle:.3f}")

mean_squared_log_error raises an error if inputs contain negatives. Clean or shift data beforehand.

Demand/sales forecasting: protects against underestimation that could cause stock-outs.
Growth metrics (traffic, population, installs): relative changes matter more than absolute differences.
Positive targets: only apply when all values are non-negative.

Captures percentage error feel while remaining in the same units (after exponentiation).
Penalises under-forecasting slightly more than over-forecasting due to logarithmic asymmetry.
Small values still need care; consider adding a minimum threshold before training.

Metric	Focus	Pitfalls
RMSE	Absolute squared error	Dominated by large-scale values
MAE	Robust absolute error	Ignores relative perspective
RMSLE	Relative/ratio error	Undefined for negatives

RMSLE blends RMSE with log scaling to emphasise relative accuracy and under-forecast penalties.
Easy to compute, but requires strict non-negativity checks.
Pair it with MAE or RMSE to balance relative and absolute error reporting.