まとめ
- Spectral clustering uses the spectrum (eigenvalues/eigenvectors) of a similarity graph to embed data into a space where k-means can easily separate the clusters.
- It handles non-linearly separable structures (e.g., concentric circles, moons) by capturing connectivity rather than raw Euclidean distance.
- Key hyperparameters include the affinity definition (
rbf, nearest-neighbour graph, cosine), the scale parameter (gammaorn_neighbors), and the label assignment strategy. - After computing the first (k) eigenvectors of the normalized Laplacian, row-normalise them and run k-means to obtain cluster labels.
1. Overview #
Given data points (x_i), we create an affinity matrix (W) whose entries reflect pairwise similarity. The (unnormalised) graph Laplacian is
$$ L = D - W, \quad D = \mathrm{diag}(d_1, \dots, d_n),; d_i = \sum_j W_{ij}. $$
Spectral clustering typically uses the symmetric normalised Laplacian
$$ L_{\mathrm{sym}} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} W D^{-1/2}. $$
Take the (k) eigenvectors associated with the smallest eigenvalues of (L_{\mathrm{sym}}), stack them into matrix (U), normalise each row, and run k-means on those rows. The resulting assignments translate back to the original samples.
2. Python example #
import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_circles
from sklearn.cluster import SpectralClustering
X, _ = make_circles(n_samples=500, factor=0.4, noise=0.05, random_state=0)
model = SpectralClustering(
n_clusters=2,
affinity="rbf",
gamma=50,
assign_labels="kmeans",
random_state=0,
)
labels = model.fit_predict(X)
plt.figure(figsize=(6, 5))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="coolwarm", s=20)
plt.title("Spectral clustering result")
plt.tight_layout()
plt.show()
3. Practical notes #
- For sparse graphs, use
affinity="nearest_neighbors"and tunen_neighbors. assign_labels="cluster_qr"can be more stable than the default k-means for high-dimensional embeddings.- Always standardise features before computing similarities, otherwise
gamma/distance scales become meaningless.
4. References #
- Shi, J., & Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE TPAMI.
- Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On Spectral Clustering: Analysis and an Algorithm. NeurIPS.
- scikit-learn developers. (2024). Spectral clustering. https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering