k-means

2.5.1

k-means

Last updated 2020-01-29 Read time 1 min
Summary
  • k-means follows an intuitive rule—group nearby points together—by repeatedly updating cluster representatives (centroids) until assignments stabilise.
  • The objective function is the within-cluster sum of squares (WCSS), i.e. the squared distance between each sample and its assigned centroid.
  • With scikit-learn’s KMeans you can visualise convergence, experiment with initialisation schemes, and inspect how assignments change.
  • Choosing \(k\) typically involves diagnostics such as the elbow method or silhouette scores, balanced with domain knowledge.

Intuition #

This method should be interpreted through its assumptions, data conditions, and how parameter choices affect generalization.

Detailed Explanation #