Kernel-PCA

Basic

Kernel PCA | Implementing in Python

Kernel PCA (kernel principal component analysis) extends ordinary PCA by mapping the data into a high-dimensional feature space and running PCA there implicitly. Thanks to the kernel trick, it can “straighten” nonlinear manifolds that vanilla PCA cannot separate.


1. Why go beyond plain PCA? #

  • Standard PCA only sees linear variance along the original axes, so nonlinear structure gets flattened incorrectly.
  • Many real datasets lie on curved surfaces; we would like to capture that curvature while still enjoying PCA’s interpretability.
  • Kernel PCA uses a nonlinear feature map (\phi(x)) and computes inner products via kernels, enabling PCA-style analysis in that richer space.

2. Formulation #

Given samples (x_i \in \mathbb{R}^d) and a feature map (\phi(x)):

  1. Build the kernel (Gram) matrix $$ K_{ij} = \langle \phi(x_i), \phi(x_j) \rangle = k(x_i, x_j) $$ No explicit (\phi) is needed as long as we can evaluate the kernel function.

  2. Solve the eigenproblem $$ K v = \lambda v $$ The eigenvectors give the principal components in feature space. After normalising we can project samples the same way as PCA.

Typical kernels:

  • RBF kernel $$k(x, x’) = \exp(-\gamma \lVert x - x’ \rVert^2)$$
  • Polynomial kernel $$k(x, x’) = (\langle x, x’ \rangle + c)^d$$

3. Create a toy dataset #

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_circles

X, y = make_circles(n_samples=400, factor=0.3, noise=0.15)

plt.figure(figsize=(8, 8))
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Concentric circles")
plt.show()

Two-circle dataset


4. Run Kernel PCA #

from sklearn.decomposition import KernelPCA

kpca = KernelPCA(kernel="rbf", gamma=3)
X_kpca = kpca.fit_transform(X)

plt.figure(figsize=(8, 8))
plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y)
plt.title("2-D embedding via Kernel PCA")
plt.show()

Kernel PCA result

Compared with linear PCA, the concentric circles are now clearly separable.


5. Tips #

  • Nonlinearity: choose kernels (RBF, polynomial) that reflect the type of curvature in your data.
  • Scaling: standardise features before building the kernel matrix to keep (\gamma) meaningful.
  • Computation: the Gram matrix is (O(n^2)); memory becomes the bottleneck for large (n).
  • Downstream models: the same kernel ideas appear in SVMs; Kernel PCA can be a useful exploratory step.

Notes #

  • Kernel PCA = kernel trick + PCA; it preserves nonlinear manifolds that ordinary PCA cannot.
  • Inspect how many components you actually need—large (k) can overfit noise.
  • If scalability is a concern, look into approximate kernels (Nyström, random Fourier features).