The Math Behind PCA(Principal Component Analysis)

4 min readJul 18, 2023

A step towards statistical modelling

Principal Component Analysis(PCA) is based on multivariate data analysis using projection methods.
PCA considers a Coordinate transformation (i.e. a rotation) from the arbitrary axes (or “features”) you started with to a set of axes ‘aligned with the data itself,’.
The central idea of PCA is in reducing the dimensionality of a data set consisting of a large number of interrelated variables while retaining variance of original dataset as much as possible.

The Mathematics…

Let the original dataset be of k+1 dimensional, ignoring the class label dimension becomes k dimensional to reduce dimension further to d such that d < k following steps of PCA must be followed….

Step1: Standardize the dataset.
Step2: Calculate the covariance matrix for the features in the dataset.
Step3: Calculate the eigenvalues and eigenvectors for the covariance matrix.
Step4: Sort eigenvalues and their corresponding eigenvectors.
Step5: Pick k eigenvalues and form a matrix of eigenvectors.
Step6: Transform the original matrix.

The Implementation…

Define Dataset

2. Standardize dataset

3. Calculate the covariance matrix for the features in the dataset

Variance: The variance is the average of the squared differences from the mean. If you’re familiar with the standard deviation, usually denoted by 𝜎, the variance is just the square of the standard deviation. Think of the variance as the “spread,” or “extent” of the data, about some particular axis (or input, or “feature”).
Covariance: “Covariance indicates the level to which two variables vary together.” To compute it, it’s kind of like the regular variance, except that instead of squaring the deviation from the mean for one variable, we multiply the deviations for the two variables:

4. Calculate the eigenvalues and eigenvectors for the covariance matrix

Given a matrix 𝐀 with dimensions 𝑛×𝑛 (i.e., 𝑛 rows and 𝑛 columns), there exist a set of 𝑛 vectors 𝑣⃗ (each with dimension 𝑛 , and 𝑖=1…𝑛) such that multiplying one of these vectors by 𝐀 results in a vector (anti)parallel to 𝑣⃗ 𝑖 , with a length that’s multiplied by some constant 𝜆𝑖 . In equation form:

𝐀𝑣⃗ 𝑖=𝜆𝑖𝑣⃗ — — — — — — — — — — (1)

where the constants 𝜆𝑖 are called eigenvalues and the vectors 𝑣⃗ 𝑖 are called eigenvectors.

“An eigenvector is a vector that a linear operator sends to a multiple of itself”

5. Sort eigenvalues and their corresponding eigenvectors

Plot Eigenvectors:

The first(Blue) eigenvector points along the direction of biggest variance
The second(Orange) eigenvector points along the direction of second-biggest variance
The third(Green) eigenvector points along the direction of third-biggest variance.
The Fourth(Red) eigenvector points along the direction of smallest variance.

6. Pick k eigenvalues and form a matrix of eigenvectors

a) Finding Principal Components

We will eliminate one or more of the less-significant directions of variance, and pick k significant eigenvectors.In other words, we will project the data onto the various principal components by projecting along the less-significant components.
Or even simpler: We will “squish” the data along the smallest-variance directions.