The Math Behind K-means Clustering

EliteAI
2 min readMay 13, 2021

--

A step towards statistical modelling…

K-means with k = 3

K-means Clustering algorithm is popular unsupervised machine learning algorithms, that solve the well-known clustering problem, with no pre-determined labels defined. It is often referred to as Lloyd’s algorithm.

The K-means clustering algorithm, tries to find out K number of centroids, and then allocates every data point to the nearest cluster. The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid. K refers to the total number of clusters to be defined in the entire dataset.

Algorithm:

  • Step1: Randomly initialize the cluster centres of each cluster from the data points.
  • Step2: For each dataset point, compute the euclidian distance from all the centroids and assign the cluster to it’s nearest centroid.
  • Step3: Recompute centroids, by taking the average of all the data points which belong to that cluster
  • Step4: Repeat the previous two steps, until there are no more changes of data points belonging to clusters.
The Algorithm

Implementation

1) Dataset Creation:

2) Initialize Cluster Centers:

3) Compute Euclidian distance

4) Recompute centroids

5) Repeat till Convergence

6) Plot Results

Thanks for reading. If you have any feedback, please feel to reach out by commenting on this post.

Check out our website! https://eliteaihub.com/

we keep on adding blogs, tools and videos to help to understand the math behind ML and AI !!

--

--

EliteAI
EliteAI

Written by EliteAI

Helping you build something Innovative…

No responses yet