Cosine similarity calculation between two matrices
Asked Answered
G

2

7

I have a code to calculate cosine similarity between two matrices:

def cos_cdist_1(matrix, vector):
    v = vector.reshape(1, -1)
    return sp.distance.cdist(matrix, v, 'cosine').reshape(-1)


def cos_cdist_2(matrix1, matrix2):
    return sp.distance.cdist(matrix1, matrix2, 'cosine').reshape(-1)

list1 = [[1,1,1],[1,2,1]]
list2 = [[1,1,1],[1,2,1]]

matrix1 = np.asarray(list1)
matrix2 = np.asarray(list2)

results = []
for vector in matrix2:
    distance = cos_cdist_1(matrix1,vector)
    distance = np.asarray(distance)
    similarity = (1-distance).tolist()
    results.append(similarity)


dist_all = cos_cdist_2(matrix1, matrix2)
results2 = []
for item in dist_all:
    distance_result = np.asarray(item)
    similarity_result = (1-distance_result).tolist()
    results2.append(similarity_result)

results is

[[1.0000000000000002, 0.9428090415820635],
                     [0.9428090415820635, 1.0000000000000002]]

However, results2 is [1.0000000000000002, 0.9428090415820635, 0.9428090415820635, 1.0000000000000002]

My ideal result is results, which means the result contains lists of similarity values, but I want to keep the calculation between two matrices instead of vector and matrix, any good idea?

Geosphere answered 10/5, 2015 at 14:33 Comment(2)
Could you normalize the matrix columns and then AB' would be the similarity matrix. Use np.dot(A,B.T)Agent
thanks for your comments, but I have to keep the matrix as it is, since it has other meaningGeosphere
P
20
In [75]: import scipy.spatial as sp
In [76]: 1 - sp.distance.cdist(matrix1, matrix2, 'cosine')
Out[76]: 
array([[ 1.        ,  0.94280904],
       [ 0.94280904,  1.        ]])

Therefore, you could eliminate the for-loops and replace it all with

results2 = 1 - sp.distance.cdist(matrix1, matrix2, 'cosine')
Puberulent answered 10/5, 2015 at 14:39 Comment(2)
is there any faster way? I have 2 sets of vectors and I want to calculate the cosine similarity of these 2 sets.Gardal
Since the cosine function can assume values in the range [-1, 1], this approach provides incorrect values when the cosine similarity is negative.Protease
L
3

you can have a look at scikit learn's API for calculating cosine similarity: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html.

Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y:

K(X, Y) = <X, Y> / (||X||*||Y||)

X: darray or sparse array, shape: (n_samples_X, n_features)

Y: darray or sparse array, shape: (n_samples_Y, n_features) If None, the output will be the pairwise similarities between all samples in X.

Letti answered 11/8, 2020 at 8:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.