The Math Behind K-means Clustering

2 min readMay 13, 2021

A step towards statistical modelling…

K-means Clustering algorithm is popular unsupervised machine learning algorithms, that solve the well-known clustering problem, with no pre-determined labels defined. It is often referred to as Lloyd’s algorithm.

The K-means clustering algorithm, tries to find out K number of centroids, and then allocates every data point to the nearest cluster. The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid. K refers to the total number of clusters to be defined in the entire dataset.

Algorithm:

Step1: Randomly initialize the cluster centres of each cluster from the data points.
Step2: For each dataset point, compute the euclidian distance from all the centroids and assign the cluster to it’s nearest centroid.
Step3: Recompute centroids, by taking the average of all the data points which belong to that cluster
Step4: Repeat the previous two steps, until there are no more changes of data points belonging to clusters.