According to several posts I found on stackoverflow (for instance this Why does word2Vec use cosine similarity?), it's common practice to calculate the cosine similarity between two word vectors after we have trained a word2vec (either CBOW or Skip-gram) model. However, this seems a little odd to me since the model is actually trained with dot-product as a similarity score. One evidence of this is that the norm of the word vectors we get after training are actually meaningful. So why is it that people still use cosine-similarity instead of dot-product when calculating the similarity between two words?
Why use cosine similarity in Word2Vec when its trained using dot-product similarity
Asked Answered
Cosine similarity and Dot product are both similarity measures but dot product is magnitude sensitive while cosine similarity is not. Depending on the occurance count of a word it might have a large or small dot product with another word. We normally normalize our vector to prevent this effect so all vectors have unit magnitude. But if your particular downstream task requires occurance count as a feature then dot product might be the way to go, but if you do not care about counts then you can simlpy calculate the cosine similarity which will normalize them.
© 2022 - 2024 — McMap. All rights reserved.