PC A.en
Summary
- PCA finds orthogonal directions of maximum variance and projects data onto leading components.
- Explained-variance ratios provide a quantitative way to choose the number of components.
- Feature scaling strongly affects PCA, so standardization is often mandatory.
Intuition #
PCA rotates the coordinate system toward directions that capture most variation. Keeping only the strongest axes compresses data while retaining dominant structure.
Detailed Explanation #
1. Why PCA? #
- High-dimensional data is hard to interpret and to visualise; PCA finds orthogonal directions that summarise the bulk of the variance.
- The method is unsupervised: it does not use labels, only the covariance structure of the data.
- Once we project onto the leading components we can visualise, denoise, or feed the compressed features to downstream models.
2. Mathematics #
Given zero-centred data matrix (X \in \mathbb{R}^{n \times d}):
- Covariance matrix $$ \Sigma = \frac{1}{n} X^\top X $$
- Eigen-decomposition $$ \Sigma v_j = \lambda_j v_j $$ where (v_j) are eigenvectors (principal axes) and (\lambda_j) eigenvalues (explained variance).
- Projection $$ Z = X V_k $$ using the top (k) eigenvectors.
3. Create a sample dataset #
| |
4. Run PCA with scikit-learn #
| |
5. Scaling matters #
| |
PCA is dominated by features with large variance; scaling (or whitening) is essential when feature units differ.
6. Practical considerations #
- Explained variance ratio: (\lambda_j / \sum_i \lambda_i) helps decide how many PCs to keep (often 80–90%).
- Computation: PCA is implemented via SVD under the hood; use
svd_solver='randomized'for large datasets. - Kernel PCA: when linear PCA is not enough, switch to kernels (see the dedicated section) or try UMAP/t-SNE for local structure.