2.1.5
Elastic Net Regression
Summary
- Elastic Net mixes the L1 (lasso) and L2 (ridge) penalties to balance sparsity and stability.
- It can retain groups of strongly correlated features while adjusting their importance collectively.
- Tuning both \(\alpha\) and
l1_ratiowith cross-validation makes it easy to strike the bias–variance balance. - Standardizing features and allowing enough iterations improve numerical stability for the optimizer.
Intuition #
This method should be interpreted through its assumptions, data conditions, and how parameter choices affect generalization.
Detailed Explanation #
Mathematical formulation #
Elastic Net minimizes
$$ \min_{\boldsymbol\beta, b} \sum_{i=1}^{n} \left( y_i - (\boldsymbol\beta^\top \mathbf{x}_i + b) \right)^2 + \alpha \left( \rho \lVert \boldsymbol\beta \rVert_1 + (1 - \rho) \lVert \boldsymbol\beta \rVert_2^2 \right), $$where \(\alpha > 0\) is the regularization strength and \(\rho \in [0,1]\) (l1_ratio) controls the L1/L2 trade-off. Moving \(\rho\) between 0 and 1 lets you explore the spectrum between ridge and lasso.
Experiments with Python #
Below we use ElasticNetCV to choose \(\alpha\) and l1_ratio simultaneously, then examine coefficients and performance.
| |
Reading the results #
ElasticNetCVevaluates multiple L1/L2 combinations automatically and picks a good balance.- When correlated features survive together, their coefficients tend to align in magnitude, which simplifies interpretation.
- If convergence is slow, standardize the inputs or raise
max_iter.
References #
- Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320.
- Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22.