Clustering

Basic

Clustering | Machine Learning Basics

Chapter 5 #

Clustering #

Group similar observations to discover structure, summarize datasets, or power downstream tasks. Choose algorithms based on shape assumptions, robustness needs, and scalability.

Algorithms at a glance #

  • k‑means / k‑means++ / X‑means: fast for spherical clusters; sensitive to scale and initialization.
  • DBSCAN / HDBSCAN: density‑based; finds arbitrary shapes and outliers; needs sensible eps/minPts.
  • Gaussian Mixture (GMM): probabilistic clusters; soft assignments and ellipsoids.
  • Hierarchical clustering: dendrograms for multi‑scale structure; linkage matters.

Practice tips #

  • Standardize features; use PCA/UMAP for visualization and to denoise.
  • Pick k via silhouette, elbow, or stability across resamples.
  • Validate with holdout labels when available; otherwise, report internal indices plus qualitative inspection.