Cosine similarity TSNE in sklearn.manifold

Asked 11/4, 2016 at 9:58 Answered 30/1, 2018 at 10:48

I have a small problem to perform TSNE on my dataset, using cosine similarity.

I have calculated the cosine similarity of all of my vectors, so I have a square matrix which contains my cosine similarity :

A = [[  1    0.7   0.5   0.6  ]
     [  0.7   1    0.3   0.4  ]
     [  0.5  0.3    1    0.1  ]
     [  0.6  0.4   0.1    1   ]]

Then, I'm using TSNE like that :

A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]])
model = manifold.TSNE(metric="precomputed")
Y = model.fit_transform(A)

But I'm not sure that to use precomputed metric keep the sense of my cosine similarity:

#[documentation][1]
If metric is “precomputed”, X is assumed to be a distance matrix

But when I try to use cosine metric, I got an error :

A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]])
model = manifold.TSNE(metric="cosine")
Y = model.fit_transform(A) 

raise ValueError("All distances should be positive, either "
ValueError: All distances should be positive, either the metric or 
precomputed distances given as X are not correct

So my question is, How is it possible to perform TSNE using cosine metric on an existent dataset (similarity matrix) ?

Wilkerson answered 11/4, 2016 at 9:58 Comment(2)

what version is scikit learn? - The code works for me. – Nitrobacteria 11/4, 2016 at 10:35

Sorry, I updates my code, I use the function fit_transform to transform my input. And the error seems to come from there ... I have coded a small part which doesn't work :

from sklearn import manifold import numpy as np  A = np.matrix([[1, 0.7,0.5,0.6],[1, 0.7,0.5,0.6],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]]) model = manifold.TSNE(metric="cosine") Y = model.fit_transform(A)

– Wilkerson 11/4, 2016 at 11:56

I can answer the majority of your question, however I'm not quite sure why that error is popping up in your second example.

You have calculated the cosine similarity of each of your vectors, but scikit assumes a distance matrix for the input to TSNE. However this is a really simple transformation distance = 1 - similarity. So for your example

import numpy as np
from sklearn import manifold
A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]])
A = 1.-A
model = manifold.TSNE(metric="precomputed")
Y = model.fit_transform(A)

This should give you the transformation you want.

Bengali answered 11/4, 2016 at 13:1 Comment(3)

Thanks! I have just read a paper on that. You are right, it works. To be more precise, we can add the square root of this value. Do you agree ? – Wilkerson 12/4, 2016 at 6:33

Why distance = 1 - similarity ? – Deist 30/1, 2018 at 9:44

It's defined as that for the cosine metric, you can see on the wiki page – Bengali 30/1, 2018 at 9:47

Can be done with sklearn pairwise_distances:

from sklearn.manifold import TSNE
from sklearn.metrics import pairwise_distances

distance_matrix = pairwise_distances(X, X, metric='cosine', n_jobs=-1)
model = TSNE(metric="precomputed")
Xpr = model.fit_transform(distance_matrix)

Values in distance_matrix will be in [0,2] range, because (1 - [-1,1]).

Deist answered 30/1, 2018 at 10:48 Comment(0)

there is currently a bug. see here: https://github.com/scikit-learn/scikit-learn/issues/5772

however scikit's t-sne uses the squared euclidean distance which is proportional to the cosine distance, assuming your data is L2 normalized

Thegn answered 7/9, 2016 at 12:46 Comment(0)

Recommended topics

Hot tags