Factor Analysis in sklearn: Explained Variance
Asked Answered
A

2

8

PCA in scikit-learn has an attribute called "explained_variance" which captures the variance explained by each component. I don't see a similar thing like this for FactorAnalysis in scikit-learn. How can I compute the variance explained by each component for Factor Analysis?

Alfaro answered 30/12, 2016 at 0:18 Comment(0)
D
10

Here is how you can do it :

First get the components matrix and the noise variance once you have performed factor analysis,let fa be your fitted model

m = fa.components_
n = fa.noise_variance_

Square this matrix

m1 = m**2

Compute the sum of each of the columns of m1

m2 = np.sum(m1,axis=1)

Now the %variance explained by the first factor will be

pvar1 = (100*m2[0])/np.sum(m2)

similarly, second factor

pvar2 = (100*m2[1])/np.sum(m2)

However, there is also a variance explained by the noise component, if you account for that in your variance explained you will need to compute

pvar1_with_noise = (100*m2[0])/(np.sum(m2)+np.sum(n))
pvar2_with_noise = (100*m2[1])/(np.sum(m2)+np.sum(n))

and so on. Hope this helps.

Delisadelisle answered 14/2, 2017 at 11:5 Comment(1)
This is actually not working for me. The figures I get for "pvar1" etc. all add up to 100%, and the last factor always has the remainder. Aren't they supposed to be sorted by the amount of variance explained, descending?Queensland
S
3

In terms of the usual nomenclature of FA/PCA, the components_ output by scikit-learn may be referred to as loadings elsewhere. For example, the package FactorAnalyzer outputs loadings_ which are equivalent, once you change the settings to match scikit-learn (i.e. set rotation=None, set method='ml', and make sure your data is standardized when input into the scikit-learn function, as FactorAnalyzer standardizes the data internally).

Compared to the components_ output of PCA from scikit-learn, which are unit-length eigenvectors, the FA ones are already scaled, so the explained variance can be extracted by summing the squares. Note that proportion of variance explained is expressed here in terms of the total variance of the original variables, not the variance of the factors, as in the answer from @Gaurav.

from sklearn.decomposition import FactorAnalysis
k_fa = 3   # e.g.

fa_k = FactorAnalysis(n_components=k_fa).fit(X_in)

fa_loadings = fa_k.components_.T    # loadings

# variance explained
total_var = X_in.var(axis=0).sum()  # total variance of original variables,
                                    # equal to no. of vars if they are standardized

var_exp = np.sum(fa_loadings**2, axis=0)
prop_var_exp = var_exp/total_var
cum_prop_var_exp = np.cumsum(var_exp/total_var)

print(f"variance explained: {var_exp.round(2)}")
print(f"proportion of variance explained: {prop_var_exp.round(3)}")
print(f"cumulative proportion of variance explained: {cum_prop_var_exp.round(3)}")

# e.g. output:
#   variance explained: [3.51 0.73]
#   proportion of variance explained: [0.351 0.073]
#   cumulative proportion of variance explained: [0.351 0.425]
Sedgemoor answered 25/3, 2022 at 20:40 Comment(2)
if I set the n_components = n_features of the dataset X_in, I would expect cum_prop_var_exp to be 100%, but only ever reaches close to high 90sPolyphyletic
@Polyphyletic No, your expectation is not true. Even if you set the n_components = n_features, there may still be some variance explained by private noise component. cum_prop_var_exp only captures shared variance even if n_components = n_features.Mercury

© 2022 - 2024 — McMap. All rights reserved.