Rule Fit.en
Summary
- RuleFit extracts decision rules from tree ensembles and combines them with original features in a sparse linear model.
- L1 regularization selects informative rules and features, improving interpretability without giving up nonlinear structure.
- Rule depth, number of generated rules, and regularization strength strongly affect generalization and sparsity.
Intuition #
RuleFit turns tree paths into human-readable if-then indicators, then learns linear weights on those indicators. You get nonlinear interactions from trees and coefficient-level interpretability from sparse linear modeling.
Detailed Explanation #
1. Idea (with formulas) #
- Extract rules: each path to a leaf becomes a binary feature (r_j(x) \in {0,1}).
- Add scaled linear terms (z_k(x)) for continuous features.
- L1-regularized linear fit:
L1 promotes sparsity so only influential rules/terms remain.
2. Dataset (OpenML: house_sales) #
King County housing prices (OpenML data_id=42092). Numeric columns only for clarity.
| |
3. Fit RuleFit #
Python implementation: christophM/rulefit
| |
4. Inspect top rules #
| |
rule: if-then condition (type=lineardenotes a linear term)coef: regression coefficient (target units)support: fraction of samples that satisfy the ruleimportance: scaled score combining coefficient magnitude and support
5. Validate via visualization #
| |
| |
6. Practical tips #
- Handle outliers (Winsorization, clipping) for stable rules.
- Clean categorical levels and encode only after grouping rare categories.
- Transform skewed targets (
log(y)or Box-Cox) if necessary. - Select rule counts/depths that stakeholders can read; cross-validate to pick limits.
- Summarize the top rules in plain language for business reports.
7. References #
- Friedman, J. H., & Popescu, B. E. (2008). Predictive Learning via Rule Ensembles. The Annals of Applied Statistics, 2(3), 916–954.
- Christoph Molnar. (2020). Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/rulefit.html