Lda.en
Summary
- LDA is supervised dimensionality reduction that maximizes between-class variance while minimizing within-class variance.
- Because labels are used, LDA is often a strong preprocessing step for classification tasks.
- Performance depends on class distribution and covariance assumptions.
Intuition #
Unlike PCA, LDA optimizes for class separability, not just overall spread. It searches for projection directions where classes are compact and well separated.
Detailed Explanation #
1. PCA vs LDA #
- PCA: unsupervised, keeps the directions of largest variance irrespective of class labels.
- LDA: supervised, searches for directions that maximise the ratio of between-class variance to within-class variance.
2. Formulation #
With labelled classes (C_1, \dots, C_k):
- Within-class scatter $$ S_W = \sum_{j=1}^k \sum_{x_i \in C_j} (x_i - \mu_j)(x_i - \mu_j)^\top $$
- Between-class scatter $$ S_B = \sum_{j=1}^k n_j (\mu_j - \mu)(\mu_j - \mu)^\top $$
- Optimisation $$ J(w) = \frac{w^\top S_B w}{w^\top S_W w} $$ The eigenvectors of (S_W^{-1} S_B) give the discriminant directions. At most (k-1) components carry information.
3. Build a dataset #
| |
4. Apply LDA #
| |
5. Compare with PCA #
| |
PCA mixes the classes because it ignores labels; LDA keeps them separated.
6. Practical notes #
- The number of useful discriminants is at most
n_classes - 1. - Standardise features before fitting, especially when different units are mixed.
- LDA assumes roughly equal covariance within classes; when that is violated, consider QDA or regularised LDA.