Isomap

2.2.6

Isomap

Last updated 2020-06-01 Read time 4 min
Summary
  • Isomap builds a neighborhood graph, estimates geodesic distances, and embeds points to preserve those distances.
  • It is effective when data lies on curved manifolds such as Swiss-roll-like structures.
  • n_neighbors is critical because it controls graph connectivity and embedding quality.

Intuition #

Isomap replaces straight-line distance with distance measured along the data manifold. By preserving manifold distances, it can unroll curved geometry into a meaningful low-dimensional map.

Detailed Explanation #

1. Idea #

  1. Build a neighbour graph with either a fixed number of neighbours (k) or a radius (\varepsilon).
  2. Compute the shortest-path (geodesic) distances along that graph for every pair of samples.
  3. Feed the resulting distance matrix to classical MDS to obtain the low-dimensional embedding.

This workflow keeps far-apart regions separated while unrolling the curved surface.


2. Python example #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_swiss_roll
from sklearn.manifold import Isomap

X, color = make_swiss_roll(n_samples=1500, noise=0.05, random_state=0)
iso = Isomap(n_neighbors=10, n_components=2)
emb = iso.fit_transform(X)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].scatter(X[:, 0], X[:, 2], c=color, cmap="Spectral", s=5)
axes[0].set_title("Original Swiss roll")
axes[1].scatter(emb[:, 0], emb[:, 1], c=color, cmap="Spectral", s=5)
axes[1].set_title("Isomap embedding")
for ax in axes:
    ax.set_xticks([])
    ax.set_yticks([])
plt.tight_layout()
plt.show()

Isomap demo


3. Hyperparameters #

  • n_neighbors: governs the local neighbourhood; too small breaks the graph, too large washes out the manifold structure.
  • n_components: usually 2 or 3 for visualisation, but higher values are possible.
  • Handle duplicates/noisy samples by scaling features or adding slight noise before building the graph.

4. Pros and cons #

ProsCons
Preserves manifold structure and distancesSensitive to noisy neighbour graphs
Produces intuitive visualisationsRequires computing all-pairs shortest paths

5. Notes #

  • Isomap = neighbourhood graph + MDS; pick neighbours carefully to reflect the true topology.
  • Inspect the connected components of the graph: isolated points will distort the embedding.
  • Consider UMAP or t-SNE if you need faster embeddings or better preservation of local densities.

FAQ #

What is Isomap? #

Isomap (Isometric Mapping) is a nonlinear dimensionality reduction algorithm that preserves the geodesic distances between points — distances measured along the surface of the data manifold rather than through empty space. It extends classical Multidimensional Scaling (MDS) to nonlinear geometry by first building a neighbourhood graph, then computing shortest-path distances along that graph.

Isomap is particularly effective for data that lies on a curved surface (a manifold), such as a Swiss roll, where straight-line (Euclidean) distances are misleading.

How does Isomap work step by step? #

  1. Build a neighbourhood graph: connect each point to its \(k\) nearest neighbours (or all points within radius \(\varepsilon\)).
  2. Compute geodesic distances: run Floyd-Warshall or Dijkstra’s algorithm to find the shortest path between every pair of points along the graph.
  3. Apply classical MDS: project points into a low-dimensional space that best preserves those geodesic distances.

The result is an embedding that “unrolls” the manifold into a flat, interpretable space.

What is the difference between Isomap, PCA, t-SNE, and UMAP? #

PCAIsomapt-SNEUMAP
TypeLinearNonlinear (global)Nonlinear (local)Nonlinear (local/global)
PreservesGlobal varianceGeodesic distancesLocal neighborhoodsLocal and some global
SpeedFastModerate (O(n²))SlowFast
Out-of-sampleEasyModerateNot nativelySupported
Best forLinear structure, compressionCurved manifoldsVisualisationVisualisation, clustering

Use Isomap when your data lies on a smooth manifold and global structure matters. Use t-SNE or UMAP for large-scale visualisation.

How do I choose n_neighbors in Isomap? #

n_neighbors controls the trade-off between local and global structure:

  • Too small (e.g., 2–3): the neighbourhood graph may split into disconnected components, causing the embedding to fail or produce distorted results.
  • Too large: distant points become direct neighbours, short-circuiting the manifold and collapsing the geodesic distances.

A typical starting range is 5–15. Check that the graph is fully connected (iso.dist_matrix_ has no infinity values) and vary n_neighbors to see which value best preserves the known structure. For noisy data, increase n_neighbors or apply smoothing before fitting.

How do I use Isomap in scikit-learn? #

1
2
3
4
5
6
7
from sklearn.manifold import Isomap

iso = Isomap(n_neighbors=10, n_components=2)
X_embedded = iso.fit_transform(X)

# Transform new points (approximate geodesic distances)
X_new_embedded = iso.transform(X_new)

Isomap supports out-of-sample extension via transform(), which approximates the geodesic distances for new points based on their nearest neighbours in the training graph.