Linear Discriminant Analysis (LDA)

2.2.4

Linear Discriminant Analysis (LDA)

Last updated 2020-06-01 Read time 4 min
Summary
  • LDA is supervised dimensionality reduction that maximizes between-class variance while minimizing within-class variance.
  • Because labels are used, LDA is often a strong preprocessing step for classification tasks.
  • Performance depends on class distribution and covariance assumptions.

Intuition #

Unlike PCA, LDA optimizes for class separability, not just overall spread. It searches for projection directions where classes are compact and well separated.

Detailed Explanation #

1. PCA vs LDA #

  • PCA: unsupervised, keeps the directions of largest variance irrespective of class labels.
  • LDA: supervised, searches for directions that maximise the ratio of between-class variance to within-class variance.

2. Formulation #

With labelled classes (C_1, \dots, C_k):

  1. Within-class scatter $$ S_W = \sum_{j=1}^k \sum_{x_i \in C_j} (x_i - \mu_j)(x_i - \mu_j)^\top $$
  2. Between-class scatter $$ S_B = \sum_{j=1}^k n_j (\mu_j - \mu)(\mu_j - \mu)^\top $$
  3. Optimisation $$ J(w) = \frac{w^\top S_B w}{w^\top S_W w} $$ The eigenvectors of (S_W^{-1} S_B) give the discriminant directions. At most (k-1) components carry information.

3. Build a dataset #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_blobs

X, y = make_blobs(
    n_samples=600,
    n_features=3,
    random_state=11711,
    cluster_std=4,
    centers=3,
)

fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(projection="3d")
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y)
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_zlabel("$x_3$")

3D blobs


4. Apply LDA #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA(n_components=2).fit(X, y)
X_lda = lda.transform(X)

plt.figure(figsize=(8, 8))
plt.scatter(X_lda[:, 0], X_lda[:, 1], c=y, alpha=0.5)
plt.xlabel("LD1")
plt.ylabel("LD2")
plt.title("2-D embedding via LDA")
plt.show()

LDA projection


5. Compare with PCA #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.figure(figsize=(8, 8))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, alpha=0.5)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("2-D embedding via PCA")
plt.show()

PCA comparison

PCA mixes the classes because it ignores labels; LDA keeps them separated.


6. Practical notes #

  • The number of useful discriminants is at most n_classes - 1.
  • Standardise features before fitting, especially when different units are mixed.
  • LDA assumes roughly equal covariance within classes; when that is violated, consider QDA or regularised LDA.

FAQ #

What is Linear Discriminant Analysis (LDA)? #

Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction and classification technique. It finds linear combinations of features that best separate two or more classes by maximizing between-class scatter while minimizing within-class scatter. Unlike PCA, LDA uses class labels, making it ideal as a preprocessing step before classification.

What is the LDA formula? #

LDA optimizes the Fisher criterion:

$$ J(w) = \frac{w^\top S_B w}{w^\top S_W w} $$

where \(S_W\) is the within-class scatter matrix and \(S_B\) is the between-class scatter matrix. The optimal projection directions \(w\) are the eigenvectors of \(S_W^{-1} S_B\). The maximum number of discriminant components is \(k - 1\) where \(k\) is the number of classes.

What is the difference between LDA and PCA? #

PCALDA
SupervisionUnsupervisedSupervised (uses labels)
GoalMaximize total varianceMaximize class separation
Max componentsmin(n_features, n_samples−1)k−1 (k = number of classes)
Best forCompression, denoisingClassification preprocessing

Use PCA when you have no labels or want general-purpose compression. Use LDA when you want projections that help a downstream classifier separate classes.

When does LDA fail or perform poorly? #

LDA assumes that each class has a similar (homogeneous) covariance matrix. It can fail when:

  • Class covariances differ significantly (use Quadratic Discriminant Analysis, QDA, instead).
  • The number of features exceeds the number of samples (the within-class scatter matrix becomes singular — use regularised LDA or PCA first).
  • Classes are not linearly separable (consider kernel LDA or nonlinear methods).

How do I use LDA in scikit-learn? #

1
2
3
4
5
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

lda = LinearDiscriminantAnalysis(n_components=2)
X_train_lda = lda.fit_transform(X_train, y_train)
X_test_lda = lda.transform(X_test)

n_components controls how many discriminant axes to keep (at most n_classes - 1). You can also use lda.predict(X_test) directly for classification without a separate classifier.