2.5.4
Bagging
Summary
- Bagging trains the same base learner on many bootstrap resamples and aggregates predictions by averaging or voting.
- The main gain is variance reduction, which stabilizes high-variance learners such as deep decision trees.
- The number of estimators and base-model complexity control the trade-off between robustness, accuracy, and compute cost.
Intuition #
Bagging works by deliberately creating many slightly different versions of the training set. Each model makes different errors; aggregation cancels part of that noise, so the final predictor is more stable than any individual model.
Detailed Explanation #
1. Procedure #
- Create multiple bootstrap samples from the training data
- Train the same model on each sample
- Average predictions for regression or vote for classification
Bagging mainly reduces variance and makes the model more stable.
2. Python example #
| |
3. Hyperparameters #
n_estimators: Number of learners. More trees are more stable but costlier.max_samples,max_features: Fraction of samples/features per learner.bootstrap: Whether to sample with replacement;bootstrap_featuresdoes the same for features.oob_score: Estimate generalisation from out-of-bag samples.
4. Pros and cons #
| Pros | Cons |
|---|---|
| Easy to implement and parallelise | Must keep many models in memory |
| Greatly reduces variance | Does not reduce bias; weak learners must be decent |
| OOB estimate avoids an extra validation split | Less interpretable than a single tree |
5. Summary #
- Bagging stabilises models by resampling data and averaging predictions.
- Decision trees + bagging = Random Forest, so the relationship is worth remembering.
- Works well at scale when training can be parallelised.