Kernel PCA | Implementing in Python

Created: 2019-03-26 Last updated: 2025-11-05 Read time: 2 min

Kernel PCA (kernel principal component analysis) extends ordinary PCA by mapping the data into a high-dimensional feature space and running PCA there implicitly. Thanks to the kernel trick, it can “straighten” nonlinear manifolds that vanilla PCA cannot separate.

1. Why go beyond plain PCA? #

Standard PCA only sees linear variance along the original axes, so nonlinear structure gets flattened incorrectly.
Many real datasets lie on curved surfaces; we would like to capture that curvature while still enjoying PCA’s interpretability.
Kernel PCA uses a nonlinear feature map (\phi(x)) and computes inner products via kernels, enabling PCA-style analysis in that richer space.

2. Formulation #

Given samples (x_i \in \mathbb{R}^d) and a feature map (\phi(x)):

Build the kernel (Gram) matrix $$ K_{ij} = \langle \phi(x_i), \phi(x_j) \rangle = k(x_i, x_j) $$ No explicit (\phi) is needed as long as we can evaluate the kernel function.
Solve the eigenproblem $$ K v = \lambda v $$ The eigenvectors give the principal components in feature space. After normalising we can project samples the same way as PCA.

Typical kernels:

RBF kernel $$k(x, x’) = \exp(-\gamma \lVert x - x’ \rVert^2)$$
Polynomial kernel $$k(x, x’) = (\langle x, x’ \rangle + c)^d$$

3. Create a toy dataset #

sklearn.datasets.make_circles

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_circles

X, y = make_circles(n_samples=400, factor=0.3, noise=0.15)

plt.figure(figsize=(8, 8))
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Concentric circles")
plt.show()

Two-circle dataset

4. Run Kernel PCA #

sklearn.decomposition.KernelPCA

from sklearn.decomposition import KernelPCA

kpca = KernelPCA(kernel="rbf", gamma=3)
X_kpca = kpca.fit_transform(X)

plt.figure(figsize=(8, 8))
plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y)
plt.title("2-D embedding via Kernel PCA")
plt.show()

Kernel PCA result

Compared with linear PCA, the concentric circles are now clearly separable.

5. Tips #

Nonlinearity: choose kernels (RBF, polynomial) that reflect the type of curvature in your data.
Scaling: standardise features before building the kernel matrix to keep (\gamma) meaningful.
Computation: the Gram matrix is (O(n^2)); memory becomes the bottleneck for large (n).
Downstream models: the same kernel ideas appear in SVMs; Kernel PCA can be a useful exploratory step.

Notes #

Kernel PCA = kernel trick + PCA; it preserves nonlinear manifolds that ordinary PCA cannot.
Inspect how many components you actually need—large (k) can overfit noise.
If scalability is a concern, look into approximate kernels (Nyström, random Fourier features).