plot_pca_correlation_graph: plot correlations between original features and principal components


A function to provide a correlation circle for PCA.

> from mlxtend.plotting import plot_pca_correlation_graph

In a so called correlation circle, the correlations between the original dataset features and the principal component(s) are shown via coordinates.

Example

The following correlation circle examples visualizes the correlation between the first two principal components and the 4 original iris dataset features.

  • Features with a positive correlation will be grouped together.
  • Totally uncorrelated features are orthogonal to each other.
  • Features with a negative correlation will be plotted on the opposing quadrants of this plot.
from mlxtend.data import iris_data
from mlxtend.plotting import plot_pca_correlation_graph
import numpy as np
X, y = iris_data()

X_norm = X / X.std(axis=0) # Normalizing the feature columns is recommended

feature_names = [
  'sepal length',
  'sepal width',
  'petal length',
  'petal width']

figure, correlation_matrix = plot_pca_correlation_graph(X_norm, 
                                                        feature_names,
                                                        dimensions=(1, 2),
                                                        figure_axis_size=10)

png

correlation_matrix
Dim 1 Dim 2
sepal length -0.891224 -0.357352
sepal width 0.449313 -0.888351
petal length -0.991684 -0.020247
petal width -0.964996 -0.062786

Further, note that the percentage values shown on the x and y axis denote how much of the variance in the original dataset is explained by each principal component axis. I.e.., if PC1 lists 72.7% and PC2 lists 23.0% as shown above, then combined, the 2 principal components explain 95.7% of the total variance.

API

plot_pca_correlation_graph(X, variables_names, dimensions=(1, 2), figure_axis_size=6, X_pca=None, explained_variance=None)

Compute the PCA for X and plots the Correlation graph

Parameters

  • X : 2d array like.

    The columns represent the different variables and the rows are the samples of thos variables

  • variables_names : array like

    Name of the columns (the variables) of X

    dimensions: tuple with two elements. dimensions to be plotted (x,y)

    figure_axis_size : size of the final frame. The figure created is a square with length and width equal to figure_axis_size.

  • X_pca : np.ndarray, shape = [n_samples, n_components].

    Optional. X_pca is the matrix of the transformed components from X. If not provided, the function computes PCA automatically using mlxtend.feature_extraction.PrincipalComponentAnalysis Expected n_componentes >= max(dimensions)

  • explained_variance : 1 dimension np.ndarray, length = n_components

    Optional. explained_variance are the eigenvalues from the diagonalized covariance matrix on the PCA transformatiopn. If not provided, the function computes PCA independently Expected n_componentes == X.shape[1]

Returns

matplotlib_figure, correlation_matrix

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/plot_pca_correlation_graph/