Feature Selection

Basic

Feature Selection | Machine Learning Basics

Chapter 7 #

Feature Selection #

Feature selection reduces input dimensionality to improve generalization, speed up training, and make models easier to interpret. Use it to cut noise, prevent overfitting, and simplify pipelines.

Approaches #

  • Filter methods: score each feature by a univariate statistic (e.g., correlation, mutual information, chi‑squared) and keep the top ones.
  • Wrapper methods: search feature subsets using a predictive model (forward/backward stepwise, RFE). Strong but costly and prone to overfitting.
  • Embedded methods: selection happens during training (L1/Lasso regularization, tree‑based feature_importances_, Boruta).

Practical tips #

  • Start simple with filters; reserve wrappers for small feature sets.
  • Prefer embedded methods when your model exposes importances or coefficients.
  • Validate with cross‑validation; monitor leakage and stability across folds.

References and tools #

  • scikit‑learn: SelectKBest, SelectFromModel, RFE, RFECV, Lasso
  • Boruta: robust wrapper around tree‑based importances