2.2.6
Isomap
- Isomap builds a neighborhood graph, estimates geodesic distances, and embeds points to preserve those distances.
- It is effective when data lies on curved manifolds such as Swiss-roll-like structures.
n_neighborsis critical because it controls graph connectivity and embedding quality.
- Principal Component Analysis (PCA) — understanding this concept first will make learning smoother
Intuition #
Isomap replaces straight-line distance with distance measured along the data manifold. By preserving manifold distances, it can unroll curved geometry into a meaningful low-dimensional map.
Detailed Explanation #
1. Idea #
- Build a neighbour graph with either a fixed number of neighbours (k) or a radius (\varepsilon).
- Compute the shortest-path (geodesic) distances along that graph for every pair of samples.
- Feed the resulting distance matrix to classical MDS to obtain the low-dimensional embedding.
This workflow keeps far-apart regions separated while unrolling the curved surface.
2. Python example #
| |
3. Hyperparameters #
n_neighbors: governs the local neighbourhood; too small breaks the graph, too large washes out the manifold structure.n_components: usually 2 or 3 for visualisation, but higher values are possible.- Handle duplicates/noisy samples by scaling features or adding slight noise before building the graph.
4. Pros and cons #
| Pros | Cons |
|---|---|
| Preserves manifold structure and distances | Sensitive to noisy neighbour graphs |
| Produces intuitive visualisations | Requires computing all-pairs shortest paths |
5. Notes #
- Isomap = neighbourhood graph + MDS; pick neighbours carefully to reflect the true topology.
- Inspect the connected components of the graph: isolated points will distort the embedding.
- Consider UMAP or t-SNE if you need faster embeddings or better preservation of local densities.
FAQ #
What is Isomap? #
Isomap (Isometric Mapping) is a nonlinear dimensionality reduction algorithm that preserves the geodesic distances between points — distances measured along the surface of the data manifold rather than through empty space. It extends classical Multidimensional Scaling (MDS) to nonlinear geometry by first building a neighbourhood graph, then computing shortest-path distances along that graph.
Isomap is particularly effective for data that lies on a curved surface (a manifold), such as a Swiss roll, where straight-line (Euclidean) distances are misleading.
How does Isomap work step by step? #
- Build a neighbourhood graph: connect each point to its \(k\) nearest neighbours (or all points within radius \(\varepsilon\)).
- Compute geodesic distances: run Floyd-Warshall or Dijkstra’s algorithm to find the shortest path between every pair of points along the graph.
- Apply classical MDS: project points into a low-dimensional space that best preserves those geodesic distances.
The result is an embedding that “unrolls” the manifold into a flat, interpretable space.
What is the difference between Isomap, PCA, t-SNE, and UMAP? #
| PCA | Isomap | t-SNE | UMAP | |
|---|---|---|---|---|
| Type | Linear | Nonlinear (global) | Nonlinear (local) | Nonlinear (local/global) |
| Preserves | Global variance | Geodesic distances | Local neighborhoods | Local and some global |
| Speed | Fast | Moderate (O(n²)) | Slow | Fast |
| Out-of-sample | Easy | Moderate | Not natively | Supported |
| Best for | Linear structure, compression | Curved manifolds | Visualisation | Visualisation, clustering |
Use Isomap when your data lies on a smooth manifold and global structure matters. Use t-SNE or UMAP for large-scale visualisation.
How do I choose n_neighbors in Isomap?
#
n_neighbors controls the trade-off between local and global structure:
- Too small (e.g., 2–3): the neighbourhood graph may split into disconnected components, causing the embedding to fail or produce distorted results.
- Too large: distant points become direct neighbours, short-circuiting the manifold and collapsing the geodesic distances.
A typical starting range is 5–15. Check that the graph is fully connected (iso.dist_matrix_ has no infinity values) and vary n_neighbors to see which value best preserves the known structure. For noisy data, increase n_neighbors or apply smoothing before fitting.
How do I use Isomap in scikit-learn? #
| |
Isomap supports out-of-sample extension via transform(), which approximates the geodesic distances for new points based on their nearest neighbours in the training graph.