AIC and BIC

Eval

AIC and BIC

まとめ
  • AIC/BIC combine likelihood and complexity penalties to assess generalisation.
  • Compute AIC/BIC in regression models and see how they react to extra features.
  • Learn when sample size and model family make one criterion preferable to the other.

1. Definitions #

For log-likelihood \(\ell\), number of parameters \(k\), and sample size \(n\):

$$ \mathrm{AIC} = -2\ell + 2k, \qquad \mathrm{BIC} = -2\ell + k \log n $$

  • AIC approximates out-of-sample error; the penalty is a constant \(2k\).
  • BIC grows the penalty with \(\log n\), favouring simpler models as the dataset gets larger.

Lower values indicate a better trade-off between fit and complexity.


2. Computing in Python #

scikit-learn does not expose AIC/BIC directly, so we can rely on statsmodels.

import statsmodels.api as sm
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)
X = sm.add_constant(X)  # add intercept

model = sm.OLS(y, X).fit()
print("AIC:", model.aic)
print("BIC:", model.bic)

model.aic and model.bic are available for OLS/GLM models; for other families choose the appropriate likelihood.


3. Intuition #

  • AIC puts emphasis on predictive performance. Because the penalty is constant, it tolerates more complex models when ample data are available.
  • BIC arises under a Bayesian approximation that assumes the true model lies in the candidate set. The \(\log n\) penalty pushes toward simpler models as n increases.
  • Compare AIC/BIC only within the same dataset and likelihood family; cross-dataset comparisons are meaningless.

4. Practical use cases #

  • Feature selection: rank candidate models by AIC/BIC and drop features that do not improve the criterion.
  • Time-series models: commonly used to pick ARIMA/SARIMAX orders (p, d, q) by minimising AIC/BIC.
  • Generalised linear models: compare link functions or distribution assumptions while balancing fit and simplicity.
  • Reporting: alongside RMSE or R², include AIC/BIC to show that complexity was controlled.

5. Caveats #

  • Likelihood assumptions: if the model’s distributional assumptions are severely violated, AIC/BIC can mislead.
  • Huge datasets: with very large n, BIC may over-penalise complexity. Choose the criterion that aligns with the business objective.
  • Comparable scopes: only compare models that use the same response variable, likelihood, and dataset.

Takeaways #

  • AIC and BIC penalise complexity differently while leveraging likelihood to balance fit vs. parsimony.
  • Remember: AIC leans toward predictive accuracy; BIC leans toward simplicity.
  • Use them alongside metrics like RMSE or Adjusted R² to make persuasive, well-rounded model selection decisions.