LDA

Basic

LDA | Linear Discriminant Analysis in Python

Created: Last updated: Read time: 2 min

Linear Discriminant Analysis (LDA) finds projections that maximise class separability. While PCA is unsupervised, LDA explicitly uses the labels to keep samples from the same class tight and push different classes apart.


1. PCA vs LDA #

  • PCA: unsupervised, keeps the directions of largest variance irrespective of class labels.
  • LDA: supervised, searches for directions that maximise the ratio of between-class variance to within-class variance.

2. Formulation #

With labelled classes (C_1, \dots, C_k):

  1. Within-class scatter $$ S_W = \sum_{j=1}^k \sum_{x_i \in C_j} (x_i - \mu_j)(x_i - \mu_j)^\top $$
  2. Between-class scatter $$ S_B = \sum_{j=1}^k n_j (\mu_j - \mu)(\mu_j - \mu)^\top $$
  3. Optimisation $$ J(w) = \frac{w^\top S_B w}{w^\top S_W w} $$ The eigenvectors of (S_W^{-1} S_B) give the discriminant directions. At most (k-1) components carry information.

3. Build a dataset #

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_blobs

X, y = make_blobs(
    n_samples=600,
    n_features=3,
    random_state=11711,
    cluster_std=4,
    centers=3,
)

fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(projection="3d")
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y)
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_zlabel("$x_3$")

3D blobs


4. Apply LDA #

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA(n_components=2).fit(X, y)
X_lda = lda.transform(X)

plt.figure(figsize=(8, 8))
plt.scatter(X_lda[:, 0], X_lda[:, 1], c=y, alpha=0.5)
plt.xlabel("LD1")
plt.ylabel("LD2")
plt.title("2-D embedding via LDA")
plt.show()

LDA projection


5. Compare with PCA #

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.figure(figsize=(8, 8))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, alpha=0.5)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("2-D embedding via PCA")
plt.show()

PCA comparison

PCA mixes the classes because it ignores labels; LDA keeps them separated.


6. Practical notes #

  • The number of useful discriminants is at most n_classes - 1.
  • Standardise features before fitting, especially when different units are mixed.
  • LDA assumes roughly equal covariance within classes; when that is violated, consider QDA or regularised LDA.

Summary #

  • LDA maximises (S_B) while minimising (S_W), yielding projections suitable for visualisation or as input to classifiers.
  • It complements PCA: PCA preserves variance, LDA preserves class structure.
  • Combine both ideas (e.g., PCA for denoising then LDA) when the number of features is huge.