XGBoost (eXtreme Gradient Boosting) is a gradient boosting implementation that focuses on regularisation and speed. It offers rich features such as missing-value handling, tree optimisations, and parallel training, making it a staple in competitions and production.
1. Key characteristics #
- Regularised loss: L1/L2 regularisation reduces overfitting.
- Default direction for missing values: missing values are routed automatically.
- Parallelisation: tree construction is parallelised by blocks for fast training.
- Advanced parameters: fine-grained control over depth, leaves, and sampling.
2. Training with the xgboost package #
import xgboost as xgb
from sklearn.metrics import mean_absolute_error
dtrain = xgb.DMatrix(X_train, label=y_train)
dvalid = xgb.DMatrix(X_valid, label=y_valid)
params = {
"objective": "reg:squarederror",
"eval_metric": "rmse",
"max_depth": 6,
"eta": 0.05,
"subsample": 0.8,
"colsample_bytree": 0.8,
"lambda": 1.0,
}
evals = [(dtrain, "train"), (dvalid, "valid")]
bst = xgb.train(
params,
dtrain,
num_boost_round=1000,
evals=evals,
early_stopping_rounds=50,
)
pred = bst.predict(xgb.DMatrix(X_test), iteration_range=(0, bst.best_iteration + 1))
print("MAE:", mean_absolute_error(y_test, pred))
Setting early_stopping_rounds stops training when validation no longer improves
and selects the best iteration automatically.
3. Main hyperparameters #
| Parameter | Role | Tuning tip |
|---|---|---|
eta | Learning rate | Smaller values are more stable but need more rounds |
max_depth | Tree depth | Deeper trees are expressive but can overfit |
min_child_weight | Minimum sum of weights in a child | Increase for noisier data |
subsample / colsample_bytree | Sampling ratios | 0.6–0.9 often improves generalisation |
lambda, alpha | L2 / L1 regularisation | Larger values reduce overfitting; use alpha for sparsity |
4. Practical usage #
- Structured data: strong performance on encoded tabular data.
- Missing values: missing values are handled internally.
- Feature importance: Gain/Weight/Cover metrics available.
- SHAP: works with
xgboost.to_graphvizandshap.TreeExplainerfor interpretability.
5. Extra tips #
- Lower the learning rate (e.g., 0.1 → 0.02) while increasing rounds to boost accuracy.
tree_method: choose"hist"for speed,"gpu_hist"for GPU,"approx"for large data.- Cross-validation: use
xgb.cvwithearly_stopping_roundsto estimate optimal rounds.
Summary #
- XGBoost combines regularisation, missing handling, and speed for strong results on tabular data.
- Tune
eta,max_depth,min_child_weight, sampling, and regularisation together. - Choose between XGBoost, LightGBM, and CatBoost based on data characteristics.