Using Pearson correlation in sklearn FeatureAgglomeration
Asked Answered
S

0

7

I have a pandas dataframe with 100 rows and 10,000 features. I want to fit hierarchical clustering on my data by using pearson correlation as the affinity argument in sklearn.cluster.FeatureAgglomeration.

I've tried two ways to make it work so far: The first is:

feature_agglomator = FeatureAgglomeration(n_clusters=10, affinity=np.corrcoef, linkage='average')

The second one:

from scipy.spatial.distance import correlation 
feature_agglomator = FeatureAgglomeration(n_clusters=10,affinity='correlation', linkage='average')

After running:

feature_agglomator.fit_transform(X)

Both ended with the same exception:

ValueError: The condensed distance matrix must contain only finite values.

What can I do for it to work propery?

Southeastwards answered 14/8, 2018 at 18:38 Comment(4)
I think you should read these two github threads related to your issue: [link]github.com/scikit-learn/scikit-learn/issues/7689 [link]github.com/scikit-learn/scikit-learn/issues/10076 Both seem to point to point to scipy refusing to perform agglomerative clustering when using cosine distance with zero vectors.Periodontal
I think that the correlation is giving you NaN. Check out your input values.Psychographer
@Psychographer you were right, I had columns filled with 0's. Thanks!Southeastwards
I’m voting to close this question because it was a fixed error by the authorCowberry

© 2022 - 2024 — McMap. All rights reserved.