Kernel PCA (kernel principal component analysis) extends ordinary PCA by mapping the data into a high-dimensional feature space and running PCA there implicitly. Thanks to the kernel trick, it can “straighten” nonlinear manifolds that vanilla PCA cannot separate.
1. Why go beyond plain PCA? #
- Standard PCA only sees linear variance along the original axes, so nonlinear structure gets flattened incorrectly.
- Many real datasets lie on curved surfaces; we would like to capture that curvature while still enjoying PCA’s interpretability.
- Kernel PCA uses a nonlinear feature map (\phi(x)) and computes inner products via kernels, enabling PCA-style analysis in that richer space.
2. Formulation #
Given samples (x_i \in \mathbb{R}^d) and a feature map (\phi(x)):
Build the kernel (Gram) matrix $$ K_{ij} = \langle \phi(x_i), \phi(x_j) \rangle = k(x_i, x_j) $$ No explicit (\phi) is needed as long as we can evaluate the kernel function.
Solve the eigenproblem $$ K v = \lambda v $$ The eigenvectors give the principal components in feature space. After normalising we can project samples the same way as PCA.
Typical kernels:
- RBF kernel $$k(x, x’) = \exp(-\gamma \lVert x - x’ \rVert^2)$$
- Polynomial kernel $$k(x, x’) = (\langle x, x’ \rangle + c)^d$$
3. Create a toy dataset #
import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_circles
X, y = make_circles(n_samples=400, factor=0.3, noise=0.15)
plt.figure(figsize=(8, 8))
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Concentric circles")
plt.show()
4. Run Kernel PCA #
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(kernel="rbf", gamma=3)
X_kpca = kpca.fit_transform(X)
plt.figure(figsize=(8, 8))
plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y)
plt.title("2-D embedding via Kernel PCA")
plt.show()
Compared with linear PCA, the concentric circles are now clearly separable.
5. Tips #
- Nonlinearity: choose kernels (RBF, polynomial) that reflect the type of curvature in your data.
- Scaling: standardise features before building the kernel matrix to keep (\gamma) meaningful.
- Computation: the Gram matrix is (O(n^2)); memory becomes the bottleneck for large (n).
- Downstream models: the same kernel ideas appear in SVMs; Kernel PCA can be a useful exploratory step.
Notes #
- Kernel PCA = kernel trick + PCA; it preserves nonlinear manifolds that ordinary PCA cannot.
- Inspect how many components you actually need—large (k) can overfit noise.
- If scalability is a concern, look into approximate kernels (Nyström, random Fourier features).