T Sne.en
Summary
- t-SNE preserves local neighborhoods by matching pairwise similarities between high and low dimensions.
- It is optimized for visualization and often reveals cluster structure clearly.
- Results depend on perplexity, learning rate, and random initialization, so stability checks are important.
- Principal Component Analysis (PCA) — understanding this concept first will make learning smoother
Intuition #
t-SNE prioritizes who is close to whom, not faithful global distances. It is best interpreted as a neighborhood map for exploratory analysis.
Detailed Explanation #
1. Intuition #
- Construct pairwise similarities (P_{ij}) in the original space (Gaussian kernel per point, controlled by perplexity).
- Define similarities (Q_{ij}) in the low-dimensional space using a heavy-tailed Student-t distribution.
- Minimise the Kullback–Leibler divergence (\mathrm{KL}(P \parallel Q)) via gradient descent.
- The heavy-tailed distribution avoids crowding by giving far points non-negligible influence.
2. Python example #
| |
3. Hyperparameters #
perplexity: effective neighbour count (typically 5–50). Higher values capture more global structure.learning_rate: too small traps the optimisation, too large causes points to fly apart; 100–1000 often works.n_iter: at least 1000 iterations plus anearly_exaggerationwarm-up phase.
4. Tips #
- Standardise features and remove duplicates; t-SNE is sensitive to scale and noise.
- For large datasets, use Barnes–Hut (
method="barnes_hut") or FFT-accelerated implementations (openTSNE, FIt-SNE). - Interpret distances qualitatively; t-SNE preserves local neighbours but not global geometry.
5. Notes #
- t-SNE is best for visual inspection rather than downstream modelling.
- Run it multiple times with different seeds/perplexities to confirm patterns are stable.
- Consider UMAP when you need faster embeddings or a transform defined for new samples.