Bagging

Last updated 2020-03-11 Read time 2 min

Summary

Bagging trains the same base learner on many bootstrap resamples and aggregates predictions by averaging or voting.
The main gain is variance reduction, which stabilizes high-variance learners such as deep decision trees.
The number of estimators and base-model complexity control the trade-off between robustness, accuracy, and compute cost.

Intuition #

Bagging works by deliberately creating many slightly different versions of the training set. Each model makes different errors; aggregation cancels part of that noise, so the final predictor is more stable than any individual model.

Detailed Explanation #

1. Procedure #

Create multiple bootstrap samples from the training data
Train the same model on each sample
Average predictions for regression or vote for classification

Bagging mainly reduces variance and makes the model more stable.

2. Python example #

import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

base = DecisionTreeRegressor(max_depth=None, random_state=0)
bagging = BaggingRegressor(
    estimator=base,
    n_estimators=100,
    max_samples=0.8,
    max_features=0.8,
    bootstrap=True,
    random_state=0,
)
bagging.fit(X_train, y_train)

pred = bagging.predict(X_test)
print("RMSE:", mean_squared_error(y_test, pred, squared=False))
print("OOB score:", bagging.oob_score_)

3. Hyperparameters #

n_estimators: Number of learners. More trees are more stable but costlier.
max_samples, max_features: Fraction of samples/features per learner.
bootstrap: Whether to sample with replacement; bootstrap_features does the same for features.
oob_score: Estimate generalisation from out-of-bag samples.

4. Pros and cons #

Pros	Cons
Easy to implement and parallelise	Must keep many models in memory
Greatly reduces variance	Does not reduce bias; weak learners must be decent
OOB estimate avoids an extra validation split	Less interpretable than a single tree

5. Summary #

Bagging stabilises models by resampling data and averaging predictions.
Decision trees + bagging = Random Forest, so the relationship is worth remembering.
Works well at scale when training can be parallelised.