t-SNE (t-Distributed Stochastic Neighbor Embedding) maps high-dimensional data down to 2D/3D while preserving local neighbourhood relationships, making digit clusters or embeddings easy to inspect.
1. Intuition #
- Construct pairwise similarities (P_{ij}) in the original space (Gaussian kernel per point, controlled by perplexity).
- Define similarities (Q_{ij}) in the low-dimensional space using a heavy-tailed Student-t distribution.
- Minimise the Kullback–Leibler divergence (\mathrm{KL}(P \parallel Q)) via gradient descent.
- The heavy-tailed distribution avoids crowding by giving far points non-negligible influence.
2. Python example #
import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
X, y = load_digits(return_X_y=True)
X = StandardScaler().fit_transform(X)
model = TSNE(n_components=2, perplexity=30, learning_rate=200, random_state=0)
emb = model.fit_transform(X)
plt.figure(figsize=(8, 6))
plt.scatter(emb[:, 0], emb[:, 1], c=y, cmap="tab10", s=15)
plt.colorbar(label="digit")
plt.title("t-SNE embedding of handwritten digits")
plt.tight_layout()
plt.show()
3. Hyperparameters #
perplexity: effective neighbour count (typically 5–50). Higher values capture more global structure.learning_rate: too small traps the optimisation, too large causes points to fly apart; 100–1000 often works.n_iter: at least 1000 iterations plus anearly_exaggerationwarm-up phase.
4. Tips #
- Standardise features and remove duplicates; t-SNE is sensitive to scale and noise.
- For large datasets, use Barnes–Hut (
method="barnes_hut") or FFT-accelerated implementations (openTSNE, FIt-SNE). - Interpret distances qualitatively; t-SNE preserves local neighbours but not global geometry.
5. Notes #
- t-SNE is best for visual inspection rather than downstream modelling.
- Run it multiple times with different seeds/perplexities to confirm patterns are stable.
- Consider UMAP when you need faster embeddings or a transform defined for new samples.