まとめ
- AIC/BIC combine likelihood and complexity penalties to assess generalisation.
- Compute AIC/BIC in regression models and see how they react to extra features.
- Learn when sample size and model family make one criterion preferable to the other.
1. Definitions #
For log-likelihood \(\ell\), number of parameters \(k\), and sample size \(n\):
$$ \mathrm{AIC} = -2\ell + 2k, \qquad \mathrm{BIC} = -2\ell + k \log n $$
- AIC approximates out-of-sample error; the penalty is a constant \(2k\).
- BIC grows the penalty with \(\log n\), favouring simpler models as the dataset gets larger.
Lower values indicate a better trade-off between fit and complexity.
2. Computing in Python #
scikit-learn does not expose AIC/BIC directly, so we can rely on statsmodels.
import statsmodels.api as sm
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
X = sm.add_constant(X) # add intercept
model = sm.OLS(y, X).fit()
print("AIC:", model.aic)
print("BIC:", model.bic)
model.aic and model.bic are available for OLS/GLM models; for other families choose the appropriate likelihood.
3. Intuition #
- AIC puts emphasis on predictive performance. Because the penalty is constant, it tolerates more complex models when ample data are available.
- BIC arises under a Bayesian approximation that assumes the true model lies in the candidate set. The \(\log n\) penalty pushes toward simpler models as
nincreases. - Compare AIC/BIC only within the same dataset and likelihood family; cross-dataset comparisons are meaningless.
4. Practical use cases #
- Feature selection: rank candidate models by AIC/BIC and drop features that do not improve the criterion.
- Time-series models: commonly used to pick ARIMA/SARIMAX orders
(p, d, q)by minimising AIC/BIC. - Generalised linear models: compare link functions or distribution assumptions while balancing fit and simplicity.
- Reporting: alongside RMSE or R², include AIC/BIC to show that complexity was controlled.
5. Caveats #
- Likelihood assumptions: if the model’s distributional assumptions are severely violated, AIC/BIC can mislead.
- Huge datasets: with very large
n, BIC may over-penalise complexity. Choose the criterion that aligns with the business objective. - Comparable scopes: only compare models that use the same response variable, likelihood, and dataset.
Takeaways #
- AIC and BIC penalise complexity differently while leveraging likelihood to balance fit vs. parsimony.
- Remember: AIC leans toward predictive accuracy; BIC leans toward simplicity.
- Use them alongside metrics like RMSE or Adjusted R² to make persuasive, well-rounded model selection decisions.