Is there any reason to (not) L2-normalize vectors before using cosine similarity?
Asked Answered
D

2

8

I was reading the paper "Improving Distributional Similarity with Lessons Learned from Word Embeddings" by Levy et al., and while discussing their hyperparameters, they say:

Vector Normalization (nrm) As mentioned in Section 2, all vectors (i.e. W’s rows) are normalized to unit length (L2 normalization), rendering the dot product operation equivalent to cosine similarity.

I then recalled that the default for the sim2 vector similarity function in the R text2vec package is to L2-norm vectors first:

sim2(x, y = NULL, method = c("cosine", "jaccard"), norm = c("l2", "none"))

So I'm wondering, what might be the motivation for this, normalizing and cosine (both in terms of text2vec and in general). I tried to read up on the L2 norm, but mostly it comes up in the context of normalizing before using the Euclidean distance. I could not find (surprisingly) anything on whether L2-norm would be recommended for or against in the case of cosine similarity on word vector spaces/embeddings. And I don't quite have the math skills to work out the analytic differences.

So here is a question, meant in the context of word vector spaces learned from textual data (either just co-occurrence matrices possible weighted by tfidf, ppmi, etc; or embeddings like GloVe), and calculating word similarity (with the goal being of course to use a vector space+metric that best reflects the real-world word similarities).
Is there, in simple words, any reason to (not) use L2 norm on a word-feature matrix/term-co-occurrence matrix before calculating cosine similarity between the vectors/words?

Dichotomous answered 11/7, 2018 at 17:10 Comment(0)
E
2

text2vec handles everything automatically - it will make rows have unit L2 norm and then call dot product to calculate cosine similarity.

But if matrix already has rows with unit L2 norm then user can specify norm = "none" and sim2 will skip first normalization step (saves some computation).

I understand confusion - probably I need to remove norm option (it doesn't take much time to normalize matrix).

Epochal answered 13/7, 2018 at 15:49 Comment(1)
Thanks. Let me get it straight - so in sim2 if method is cosine but norm="none" (and no prior normalization has been carried out), then what sim2 actually does is only dot product, not cosine similarity per se?Dichotomous
M
7

If you want to get cosine similarity you DON'T need to normalize to L2 norm and then calculate cosine similarity. Cosine similarity anyway normalizes the vector and then takes dot product of two.

If you are calculating Euclidean distance then u NEED to normalize if distance or vector length is not an important distinguishing factor. If vector length is a distinguishing factor then don't normalize and calculate Euclidean distance as it is.

Micra answered 25/9, 2018 at 3:8 Comment(0)
E
2

text2vec handles everything automatically - it will make rows have unit L2 norm and then call dot product to calculate cosine similarity.

But if matrix already has rows with unit L2 norm then user can specify norm = "none" and sim2 will skip first normalization step (saves some computation).

I understand confusion - probably I need to remove norm option (it doesn't take much time to normalize matrix).

Epochal answered 13/7, 2018 at 15:49 Comment(1)
Thanks. Let me get it straight - so in sim2 if method is cosine but norm="none" (and no prior normalization has been carried out), then what sim2 actually does is only dot product, not cosine similarity per se?Dichotomous

© 2022 - 2024 — McMap. All rights reserved.