PCA Correlation Circle

A function to provide a correlation circle for pca

from mlxtend.plotting import plot_pca_correlation_graph

In a so called correlation circle, the correlations between the original dataset features and the principal component(s) are shown via coordinates.

Example

The following correlation circle examples visualizes the correlation between the first two principal components and the 4 original iris dataset features.

from mlxtend.data import iris_data
from mlxtend.plotting import plot_pca_correlation_graph
import numpy as np
X, y = iris_data()

X_norm = X / X.std(axis=0) # Normalizing the feature columns is recommended

feature_names = [
  'sepal length',
  'sepal width',
  'petal length',
  'petal width']

figure, correlation_matrix = plot_pca_correlation_graph(X_norm, 
                                                        feature_names,
                                                        pc_dimensions=(1, 2),
                                                        figure_axis_size=10)

png

correlation_matrix
Principal Component 1 Principal Component 2
sepal length -0.891224 -0.357352
sepal width 0.449313 -0.888351
petal length -0.991684 -0.020247
petal width -0.964996 -0.062786

Further, note that the percentage values shown on the x and y axis denote how much of the variance in the original dataset is explained by each principal component axis. I.e.., if PC1 lists 72.7% and PC2 lists 23.0% as shown above, then combined, the 2 principal components explain 95.7% of the total variance.

API

plot_pca_correlation_graph(X, variables_names, pc_dimensions=(1, 2), figure_axis_size=6, X_pca=None)

Computes PCA for X and plots the correlation plot

Parameters

Returns

matplotlib_figure , correlation_matrix