Bagging

2.5.4

Bagging

Last updated 2020-03-11 Read time 2 min
Summary
  • Bagging trains the same base learner on many bootstrap resamples and aggregates predictions by averaging or voting.
  • The main gain is variance reduction, which stabilizes high-variance learners such as deep decision trees.
  • The number of estimators and base-model complexity control the trade-off between robustness, accuracy, and compute cost.

Intuition #

Bagging works by deliberately creating many slightly different versions of the training set. Each model makes different errors; aggregation cancels part of that noise, so the final predictor is more stable than any individual model.

Detailed Explanation #

1. Procedure #

  1. Create multiple bootstrap samples from the training data
  2. Train the same model on each sample
  3. Average predictions for regression or vote for classification

Bagging mainly reduces variance and makes the model more stable.


2. Python example #

import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

base = DecisionTreeRegressor(max_depth=None, random_state=0)
bagging = BaggingRegressor(
    estimator=base,
    n_estimators=100,
    max_samples=0.8,
    max_features=0.8,
    bootstrap=True,
    random_state=0,
)
bagging.fit(X_train, y_train)

pred = bagging.predict(X_test)
print("RMSE:", mean_squared_error(y_test, pred, squared=False))
print("OOB score:", bagging.oob_score_)

3. Hyperparameters #

  • n_estimators: Number of learners. More trees are more stable but costlier.
  • max_samples, max_features: Fraction of samples/features per learner.
  • bootstrap: Whether to sample with replacement; bootstrap_features does the same for features.
  • oob_score: Estimate generalisation from out-of-bag samples.

4. Pros and cons #

ProsCons
Easy to implement and paralleliseMust keep many models in memory
Greatly reduces varianceDoes not reduce bias; weak learners must be decent
OOB estimate avoids an extra validation splitLess interpretable than a single tree

5. Summary #

  • Bagging stabilises models by resampling data and averaging predictions.
  • Decision trees + bagging = Random Forest, so the relationship is worth remembering.
  • Works well at scale when training can be parallelised.