AIC and BIC

中級

4.1.4

AIC and BIC

Last updated 2020-03-11 Read time 2 min

Summary

AIC/BIC combine likelihood and complexity penalties to assess generalisation.
Compute AIC/BIC in regression models and see how they react to extra features.
Learn when sample size and model family make one criterion preferable to the other.

1. Definitions #

For log-likelihood \(\ell\), number of parameters \(k\), and sample size \(n\):

$$ \mathrm{AIC} = -2\ell + 2k, \qquad \mathrm{BIC} = -2\ell + k \log n $$

AIC approximates out-of-sample error; the penalty is a constant \(2k\).
BIC grows the penalty with \(\log n\), favouring simpler models as the dataset gets larger.

Lower values indicate a better trade-off between fit and complexity.

2. Computing in Python #

scikit-learn does not expose AIC/BIC directly, so we can rely on statsmodels.

1
2
3
4
5
6
7
8
9
import statsmodels.api as sm
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)
X = sm.add_constant(X)  # add intercept

model = sm.OLS(y, X).fit()
print("AIC:", model.aic)
print("BIC:", model.bic)

model.aic and model.bic are available for OLS/GLM models; for other families choose the appropriate likelihood.

3. Intuition #

AIC puts emphasis on predictive performance. Because the penalty is constant, it tolerates more complex models when ample data are available.
BIC arises under a Bayesian approximation that assumes the true model lies in the candidate set. The \(\log n\) penalty pushes toward simpler models as n increases.
Compare AIC/BIC only within the same dataset and likelihood family; cross-dataset comparisons are meaningless.

4. Practical use cases #

Feature selection: rank candidate models by AIC/BIC and drop features that do not improve the criterion.
Time-series models: commonly used to pick ARIMA/SARIMAX orders (p, d, q) by minimising AIC/BIC.
Generalised linear models: compare link functions or distribution assumptions while balancing fit and simplicity.
Reporting: alongside RMSE or R², include AIC/BIC to show that complexity was controlled.

5. Caveats #

Likelihood assumptions: if the model’s distributional assumptions are severely violated, AIC/BIC can mislead.
Huge datasets: with very large n, BIC may over-penalise complexity. Choose the criterion that aligns with the business objective.
Comparable scopes: only compare models that use the same response variable, likelihood, and dataset.

Takeaways #

AIC and BIC penalise complexity differently while leveraging likelihood to balance fit vs. parsimony.
Remember: AIC leans toward predictive accuracy; BIC leans toward simplicity.
Use them alongside metrics like RMSE or Adjusted R² to make persuasive, well-rounded model selection decisions.