t-SNE

Basic

t-SNE | Visualising high-dimensional structure

Created: Last updated: Read time: 2 min

t-SNE (t-Distributed Stochastic Neighbor Embedding) maps high-dimensional data down to 2D/3D while preserving local neighbourhood relationships, making digit clusters or embeddings easy to inspect.


1. Intuition #

  • Construct pairwise similarities (P_{ij}) in the original space (Gaussian kernel per point, controlled by perplexity).
  • Define similarities (Q_{ij}) in the low-dimensional space using a heavy-tailed Student-t distribution.
  • Minimise the Kullback–Leibler divergence (\mathrm{KL}(P \parallel Q)) via gradient descent.
  • The heavy-tailed distribution avoids crowding by giving far points non-negligible influence.

2. Python example #

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

X, y = load_digits(return_X_y=True)
X = StandardScaler().fit_transform(X)

model = TSNE(n_components=2, perplexity=30, learning_rate=200, random_state=0)
emb = model.fit_transform(X)

plt.figure(figsize=(8, 6))
plt.scatter(emb[:, 0], emb[:, 1], c=y, cmap="tab10", s=15)
plt.colorbar(label="digit")
plt.title("t-SNE embedding of handwritten digits")
plt.tight_layout()
plt.show()

t-SNE example


3. Hyperparameters #

  • perplexity: effective neighbour count (typically 5–50). Higher values capture more global structure.
  • learning_rate: too small traps the optimisation, too large causes points to fly apart; 100–1000 often works.
  • n_iter: at least 1000 iterations plus an early_exaggeration warm-up phase.

4. Tips #

  • Standardise features and remove duplicates; t-SNE is sensitive to scale and noise.
  • For large datasets, use Barnes–Hut (method="barnes_hut") or FFT-accelerated implementations (openTSNE, FIt-SNE).
  • Interpret distances qualitatively; t-SNE preserves local neighbours but not global geometry.

5. Notes #

  • t-SNE is best for visual inspection rather than downstream modelling.
  • Run it multiple times with different seeds/perplexities to confirm patterns are stable.
  • Consider UMAP when you need faster embeddings or a transform defined for new samples.