まとめ
- Decision trees split the feature space with simple questions, producing a set of human-readable rules that balance accuracy and interpretability.
- By combining classifier and regressor variants, gradient-boosted ensembles, and tooling such as dtreeviz, we can explain complex behaviour with intuitive diagrams.
- Hyperparameters (depth, leaf size, pruning) govern the bias–variance trade-off and should be tuned with validation data while monitoring the resulting rule set.
Decision Trees #
1. Overview #
A tree asks a sequence of if-then questions: each split routes samples left or right until they reach a leaf that outputs a prediction. Because every path corresponds to an explicit rule, trees are popular whenever explanations matter—credit scoring, operations, or any workflow that needs clear business logic.
2. Impurity and pruning #
Impurity metrics such as Gini, entropy, MSE, or MAE quantify how mixed a node is. Greedy growth maximises impurity reduction, while pruning trims branches whose improvement does not justify their complexity by adding penalties such as cost-complexity (\alpha |T|).
3. Python walkthroughs #
Explore the sub-pages in this chapter to see:
- Decision Tree Classifier – learn about Gini/entropy, decision regions, and plotting the tree.
- Decision Tree Regressor – fit piecewise constant functions, evaluate (R^2), RMSE, MAE, and inspect prediction surfaces.
- Tree Parameters – compare the effect of
max_depth,min_samples_leaf,ccp_alpha, and different split criteria. - RuleFit – combine tree-derived rules with linear terms for sparse, interpretable models.
All examples rely on scikit-learn and can be reproduced by running the bundled Python notebooks or scripts.
4. References #
- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth.
- scikit-learn developers. (2024). Decision Trees. https://scikit-learn.org/stable/modules/tree.html