Bagging

Basic

Bagging | Ensemble Basics for Reducing Variance

Created: Last updated: Read time: 2 min

Bagging (Bootstrap Aggregating) trains multiple weak learners on bootstrap samples and averages or votes their predictions. When combined with decision trees, it leads directly to Random Forest.


1. Procedure #

  1. Create multiple bootstrap samples from the training data
  2. Train the same model on each sample
  3. Average predictions for regression or vote for classification

Bagging mainly reduces variance and makes the model more stable.


2. Python example #

import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

base = DecisionTreeRegressor(max_depth=None, random_state=0)
bagging = BaggingRegressor(
    estimator=base,
    n_estimators=100,
    max_samples=0.8,
    max_features=0.8,
    bootstrap=True,
    random_state=0,
)
bagging.fit(X_train, y_train)

pred = bagging.predict(X_test)
print("RMSE:", mean_squared_error(y_test, pred, squared=False))
print("OOB score:", bagging.oob_score_)

3. Hyperparameters #

  • n_estimators: Number of learners. More trees are more stable but costlier.
  • max_samples, max_features: Fraction of samples/features per learner.
  • bootstrap: Whether to sample with replacement; bootstrap_features does the same for features.
  • oob_score: Estimate generalisation from out-of-bag samples.

4. Pros and cons #

ProsCons
Easy to implement and paralleliseMust keep many models in memory
Greatly reduces varianceDoes not reduce bias; weak learners must be decent
OOB estimate avoids an extra validation splitLess interpretable than a single tree

5. Summary #

  • Bagging stabilises models by resampling data and averaging predictions.
  • Decision trees + bagging = Random Forest, so the relationship is worth remembering.
  • Works well at scale when training can be parallelised.