Hierarchical Clustering

Basic

Hierarchical Clustering | Dendrogram-based grouping

Created: Last updated: Read time: 2 min
まとめ
  • Hierarchical clustering creates a tree (dendrogram) that shows how samples merge, so you can explore cluster structures at multiple resolutions.
  • In the agglomerative setting every point starts as its own cluster and pairs are merged step by step according to a linkage criterion.
  • Linkage options (ward, complete, average, single) and distance metrics (euclidean, cosine, etc.) change the shape of the dendrogram.
  • You can visualise the hierarchy with scipy.cluster.hierarchy and extract flat clusters with scikit-learn’s AgglomerativeClustering.

1. Overview #

Agglomerative clustering proceeds as follows:

  1. Treat each sample as its own cluster.
  2. Find the pair of clusters with the smallest distance under the chosen linkage rule.
  3. Merge the pair, record the merge height, and repeat until everything becomes one cluster.

Cutting the dendrogram at a particular height yields a flat clustering. Ward’s linkage tends to produce compact clusters, while single linkage often forms chain-like clusters.

2. Linkage criteria #

Let (C_i) and (C_j) be clusters with centroids (\mu_i) and (\mu_j). Common linkage rules:

  • Ward: minimises the increase in within-cluster variance $$ \Delta = \frac{|C_i||C_j|}{|C_i| + |C_j|} \lVert \mu_i - \mu_j \rVert^2. $$
  • Complete: distance between the farthest pair of points across clusters.
  • Average: average pairwise distance between points in the two clusters.
  • Single: distance between the closest pair of points (prone to chaining).

Pick the rule that matches the shapes you expect and scale your features beforehand.

3. Python example #

The snippet below builds a dendrogram with SciPy and then performs agglomerative clustering with scikit-learn:

import numpy as np
from sklearn.datasets import make_blobs
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=50, centers=3, cluster_std=0.7, random_state=42)

Z = linkage(X, method="ward")

plt.figure(figsize=(8, 4))
dendrogram(Z, truncate_mode="level", p=5)
plt.title("Hierarchical clustering with Ward linkage")
plt.xlabel("sample index or cluster ID")
plt.ylabel("distance")
plt.show()

Ward dendrogram

from sklearn.cluster import AgglomerativeClustering

clustering = AgglomerativeClustering(
    n_clusters=3,
    linkage="average",
    affinity="euclidean",
)
labels = clustering.fit_predict(X)

Set distance_threshold instead of n_clusters if you want to cut the dendrogram at a custom height.

4. References #

  • Ward, J. H. (1963). Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association.
  • Müllner, D. (2011). Modern Hierarchical, Agglomerative Clustering Algorithms. arXiv:1109.2378.
  • scikit-learn developers. (2024). Hierarchical clustering. https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering