T Sne.en

T Sne.en

Last updated 2026-02-16 Read time 2 min
Summary
  • t-SNE preserves local neighborhoods by matching pairwise similarities between high and low dimensions.
  • It is optimized for visualization and often reveals cluster structure clearly.
  • Results depend on perplexity, learning rate, and random initialization, so stability checks are important.

Intuition #

t-SNE prioritizes who is close to whom, not faithful global distances. It is best interpreted as a neighborhood map for exploratory analysis.

Detailed Explanation #

1. Intuition #

  • Construct pairwise similarities (P_{ij}) in the original space (Gaussian kernel per point, controlled by perplexity).
  • Define similarities (Q_{ij}) in the low-dimensional space using a heavy-tailed Student-t distribution.
  • Minimise the Kullback–Leibler divergence (\mathrm{KL}(P \parallel Q)) via gradient descent.
  • The heavy-tailed distribution avoids crowding by giving far points non-negligible influence.

2. Python example #

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

X, y = load_digits(return_X_y=True)
X = StandardScaler().fit_transform(X)

model = TSNE(n_components=2, perplexity=30, learning_rate=200, random_state=0)
emb = model.fit_transform(X)

plt.figure(figsize=(8, 6))
plt.scatter(emb[:, 0], emb[:, 1], c=y, cmap="tab10", s=15)
plt.colorbar(label="digit")
plt.title("t-SNE embedding of handwritten digits")
plt.tight_layout()
plt.show()

t-SNE example


3. Hyperparameters #

  • perplexity: effective neighbour count (typically 5–50). Higher values capture more global structure.
  • learning_rate: too small traps the optimisation, too large causes points to fly apart; 100–1000 often works.
  • n_iter: at least 1000 iterations plus an early_exaggeration warm-up phase.

4. Tips #

  • Standardise features and remove duplicates; t-SNE is sensitive to scale and noise.
  • For large datasets, use Barnes–Hut (method="barnes_hut") or FFT-accelerated implementations (openTSNE, FIt-SNE).
  • Interpret distances qualitatively; t-SNE preserves local neighbours but not global geometry.

5. Notes #

  • t-SNE is best for visual inspection rather than downstream modelling.
  • Run it multiple times with different seeds/perplexities to confirm patterns are stable.
  • Consider UMAP when you need faster embeddings or a transform defined for new samples.