LightGBM

Basic

LightGBM | How it works and tuning tips

Created: Last updated: Read time: 3 min

LightGBM is a fast gradient-boosting library developed by Microsoft. With histogram approximation and leaf-wise tree growth, it keeps high accuracy while speeding up training on large, high-dimensional data.

It offers GPU training, native categorical handling, and distributed learning, which makes it a popular choice on Kaggle and in production.


1. Key features #

  • Leaf-wise growth: Expands the leaf with the largest loss reduction. It can grow deep, so control it with max_depth or num_leaves.
  • Histogram approximation: Bins continuous values (e.g., 256 bins) to reduce compute and memory.
  • Gradient/Hessian reuse: Aggregates statistics per bin to improve cache efficiency.
  • Native categorical support: Finds optimal binary splits without one-hot encoding.
  • Production-ready features: Early stopping, sample weights, monotone constraints, GPU support.

2. The objective in formulas #

Like standard gradient boosting, LightGBM adds weak learners using negative gradients.

$$ \mathcal{L} = \sum_{i=1}^n \ell\big(y_i, F_{m-1}(x_i) + f_m(x_i)\big) + \Omega(f_m) $$

Here \(f_m\) is a gradient boosting tree. LightGBM aggregates gradients \(g_i\) and Hessians \(h_i\) by histogram bins and computes split gain as

$$ \text{Gain} = \frac{1}{2} \left( \frac{\left(\sum_{i \in L} g_i\right)^2}{\sum_{i \in L} h_i + \lambda} + \frac{\left(\sum_{i \in R} g_i\right)^2}{\sum_{i \in R} h_i + \lambda} - \frac{\left(\sum_{i \in (L \cup R)} g_i\right)^2}{\sum_{i \in (L \cup R)} h_i + \lambda} \right) - \gamma $$

where \(\lambda\) is L2 regularisation and \(\gamma\) is a split penalty.


3. Binary classification in Python #

import lightgbm as lgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score

X, y = make_classification(
    n_samples=30_000,
    n_features=40,
    n_informative=12,
    n_redundant=8,
    weights=[0.85, 0.15],
    random_state=42,
)

X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

params = {
    "objective": "binary",
    "metric": ["binary_logloss", "auc"],
    "learning_rate": 0.05,
    "num_leaves": 31,
    "feature_fraction": 0.8,
    "bagging_fraction": 0.8,
    "bagging_freq": 5,
    "lambda_l2": 1.0,
    "min_data_in_leaf": 30,
}

train_data = lgb.Dataset(X_train, label=y_train)
valid_data = lgb.Dataset(X_valid, label=y_valid)

gbm = lgb.train(
    params,
    train_data,
    num_boost_round=1000,
    valid_sets=[train_data, valid_data],
    valid_names=["train", "valid"],
    callbacks=[lgb.early_stopping(stopping_rounds=50, verbose=True)],
)

y_pred = gbm.predict(X_valid)
print("ROC-AUC:", roc_auc_score(y_valid, y_pred))
print(classification_report(y_valid, (y_pred > 0.5).astype(int)))

With early stopping, you can set a large num_boost_round while still stopping before overfitting.


4. Main hyperparameters #

ParameterRole / tips
num_leavesNumber of leaves. Controls capacity; grow it gradually below 2 ** max_depth.
max_depthDepth limit. -1 means unlimited; smaller values reduce overfitting.
min_data_in_leaf / min_sum_hessian_in_leafRequired samples / Hessian per leaf. Larger values smooth the model.
learning_rateSmaller values are more stable but need more trees.
feature_fractionFeature subsampling to improve generalisation.
bagging_fraction & bagging_freqRow sampling; enabled when bagging_freq > 0.
lambda_l1, lambda_l2L1/L2 regularisation for complexity control.
min_gain_to_splitMinimum gain required to split; helps avoid noisy splits.

5. Categorical features and missing values #

  • Pass column indices or names to categorical_feature. No target encoding needed.
  • Missing values are routed to their own branch, so NaN is acceptable. For many missing values, consider imputation too.
  • Monotonic constraints can be applied with monotone_constraints.
categorical_cols = [0, 2, 5]
train_data = lgb.Dataset(
    X_train,
    label=y_train,
    categorical_feature=categorical_cols,
)

6. A practical tuning flow #

  1. Baseline: Start with learning_rate=0.1, num_leaves=31, feature_fraction=1.0.
  2. Capacity: Adjust num_leaves and min_data_in_leaf together while watching validation loss.
  3. Randomisation: Set feature_fraction or bagging_fraction to 0.6–0.9 for better generalisation.
  4. Regularisation: Add lambda_l1/lambda_l2 or min_gain_to_split if needed.
  5. Lower learning rate: If performance plateaus, use learning_rate=0.01 and increase num_boost_round.
  6. Ensemble: Combine multiple LightGBM models with different seeds/initialisations.

7. Summary #

  • LightGBM uses histogram bins and leaf-wise growth to achieve fast, accurate boosting.
  • Balance num_leaves and learning_rate, and rely on regularisation and subsampling for generalisation.
  • Its categorical, missing-value, and GPU support make it strong in production and ensembles.