Difference between cosine similarity and cosine distance
Asked Answered
M

1

24

It looks like scipy.spatial.distance.cdist cosine similariy distance:

link to cos distance 1

1 - u*v/(||u||||v||)

is different from sklearn.metrics.pairwise.cosine_similarity which is

link to cos similarity 2

 u*v/||u||||v||

Does anybody know reason for different definitions?

Mali answered 14/10, 2019 at 16:56 Comment(3)
The link that you labeled "link to cos similarity 1" is not cosine similarity, and it is not called that in the link. It is cosine distance.Sucker
Think of the trivial case: distance(X, X) should be 0, because the distance from X to X is 0. similarity(X, X) should be the maximum of the function that measures similariy (1 in this case), because X and X are as similar as two things can be.Sucker
@WarrenWeckesser, thank you, I fixed the name.Mali
R
42

Good question but yes, these are 2 different things but connected by the following equation:

Cosine_distance = 1 - cosine_similarity


Why?

Usually, people use the cosine similarity as a similarity metric between vectors. Now, the distance can be defined as 1-cos_similarity.

The intuition behind this is that if 2 vectors are perfectly the same then similarity is 1 (angle=0) and thus, distance is 0 (1-1=0).

Similarly you can define the cosine distance for the resulting similarity value range.

Cosine similarity range: −1 meaning exactly opposite, 1 meaning exactly the same, 0 indicating orthogonality.


References: Scipy wolfram

From scipy

Ricebird answered 14/10, 2019 at 16:58 Comment(6)
Thank you for explanation. Terminology a bit confusing. I feel like cosine distance should be called simply cosine. Cosine similarity distance should be called cosine distance.Mali
I agree but this is how it is defined in the engineering/math community.Ricebird
Yeah, does not make sense to change it now.Mali
@Mali see the first bullet point here, for something to be a distance it must satisfy "d(x,y) = 0 if and only if x = y. i.e.is zero precisely from a point to itself". The cosine distance satisfies this, cosine similarity does not. Hence the terminology.Pavement
@Pavement Thank you Dan. Your explanation makes sense. Interesting how cosine_similarity is under sklearn.metrics while not being a metricMali
Take a look at the second sentence in this article, while not strictly a mathematical metric, in stats similarities are colloquially referred to as metrics as they fill similar roles. sklearn's metrics are more like measurements (colloquially).Pavement

© 2022 - 2024 — McMap. All rights reserved.