Kernel PC A.en
Summary
- Kernel PCA performs PCA in an implicit nonlinear feature space using kernels.
- Kernel choice and parameters (for example RBF gamma) determine which nonlinear structure becomes separable.
- It is useful when linear PCA misses curved or interaction-heavy patterns.
Intuition #
Kernel PCA changes the similarity geometry before extracting principal components. Patterns that are nonlinear in input space can become linearly separable in kernel space.
Detailed Explanation #
1. Why go beyond plain PCA? #
- Standard PCA only sees linear variance along the original axes, so nonlinear structure gets flattened incorrectly.
- Many real datasets lie on curved surfaces; we would like to capture that curvature while still enjoying PCA’s interpretability.
- Kernel PCA uses a nonlinear feature map (\phi(x)) and computes inner products via kernels, enabling PCA-style analysis in that richer space.
2. Formulation #
Given samples (x_i \in \mathbb{R}^d) and a feature map (\phi(x)):
Build the kernel (Gram) matrix
$$ K_{ij} = \langle \phi(x_i), \phi(x_j) \rangle = k(x_i, x_j) $$No explicit (\phi) is needed as long as we can evaluate the kernel function.
Solve the eigenproblem
$$ K v = \lambda v $$The eigenvectors give the principal components in feature space. After normalising we can project samples the same way as PCA.
Typical kernels:
- RBF kernel $$k(x, x') = \exp(-\gamma \lVert x - x' \rVert^2)$$
- Polynomial kernel $$k(x, x') = (\langle x, x' \rangle + c)^d$$
3. Create a toy dataset #
| |
4. Run Kernel PCA #
| |
Compared with linear PCA, the concentric circles are now clearly separable.
5. Tips #
- Nonlinearity: choose kernels (RBF, polynomial) that reflect the type of curvature in your data.
- Scaling: standardise features before building the kernel matrix to keep (\gamma) meaningful.
- Computation: the Gram matrix is (O(n^2)); memory becomes the bottleneck for large (n).
- Downstream models: the same kernel ideas appear in SVMs; Kernel PCA can be a useful exploratory step.
Notes #
- Kernel PCA = kernel trick + PCA; it preserves nonlinear manifolds that ordinary PCA cannot.
- Inspect how many components you actually need—large (k) can overfit noise.
- If scalability is a concern, look into approximate kernels (Nyström, random Fourier features).