Plot a Correlation Circle in Python
Asked Answered
M

2

7

I've been doing some Geometrical Data Analysis (GDA) such as Principal Component Analysis (PCA). I'm looking to plot a Correlation Circle... these look a bit like this:

1]

Basically, it allows to measure to which extend the Eigenvalue / Eigenvector of a variable is correlated to the principal components (dimensions) of a dataset.

Anyone knows if there is a python package that plots such data visualization?

Migratory answered 14/6, 2016 at 15:15 Comment(3)
Possible duplicate of PCA Scaling with ggbiplotKazimir
Actually it's not the same, here I'm trying to use Python not RMigratory
Yes the PCA circle is possible using the mlextend package. rasbt.github.io/mlxtend/user_guide/plotting/…Myke
M
6

Here is a simple example using sklearn and the iris dataset. Includes both the factor map for the first two dimensions and a scree plot:

from sklearn.decomposition import PCA
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
 
df = sns.load_dataset('iris')
 
n_components = 4
 
# Do the PCA.
pca = PCA(n_components=n_components)
reduced = pca.fit_transform(df[['sepal_length', 'sepal_width',
                                'petal_length', 'petal_width']])

# Append the principle components for each entry to the dataframe
for i in range(0, n_components):
    df['PC' + str(i + 1)] = reduced[:, i]

display(df.head())

# Do a scree plot
ind = np.arange(0, n_components)
(fig, ax) = plt.subplots(figsize=(8, 6))
sns.pointplot(x=ind, y=pca.explained_variance_ratio_)
ax.set_title('Scree plot')
ax.set_xticks(ind)
ax.set_xticklabels(ind)
ax.set_xlabel('Component Number')
ax.set_ylabel('Explained Variance')
plt.show()

# Show the points in terms of the first two PCs
g = sns.lmplot('PC1',
               'PC2',
               hue='species',data=df,
               fit_reg=False,
               scatter=True,
               size=7)

plt.show()

# Plot a variable factor map for the first two dimensions.
(fig, ax) = plt.subplots(figsize=(8, 8))
for i in range(0, pca.components_.shape[1]):
    ax.arrow(0,
             0,  # Start the arrow at the origin
             pca.components_[0, i],  #0 for PC1
             pca.components_[1, i],  #1 for PC2
             head_width=0.1,
             head_length=0.1)

    plt.text(pca.components_[0, i] + 0.05,
             pca.components_[1, i] + 0.05,
             df.columns.values[i])


an = np.linspace(0, 2 * np.pi, 100)
plt.plot(np.cos(an), np.sin(an))  # Add a unit circle for scale
plt.axis('equal')
ax.set_title('Variable factor map')
plt.show()

It'd be a good exercise to extend this to further PCs, to deal with scaling if all components are small, and to avoid plotting factors with minimal contributions.

Maneuver answered 30/4, 2018 at 10:21 Comment(1)
Thanks for this - one change, the loop for plotting the variable factor map should be over the number of features, not the number of components. Instead of range(0, len(pca.components_)), it should be range(pca.components_.shape[1]).Favela
C
4

I agree it's a pity not to have it in some mainstream package such as sklearn.

Here is a home-made implementation: https://github.com/mazieres/analysis/blob/master/analysis.py#L19-34

Condorcet answered 16/6, 2016 at 8:28 Comment(1)
Yeah, this would fit perfectly in mlxtend. Why not submitting a PR Christophe?Beckner

© 2022 - 2024 — McMap. All rights reserved.