word2vec - what is best? add, concatenate or average word vectors?
Asked Answered
T

4

21

I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the word embedding (rows of input/hidden matrix) and the context embedding (columns of hidden/output matrix).

As outlined in this post there are at least three common ways to combine these two embedding vectors:

  1. summing the context and word vector for each word
  2. summing & averaging
  3. concatenating the context and word vector

However, I couldn't find proper papers or reports on the best strategy. So my questions are:

  1. Is there a common solution whether to sum, average or concatenate the vectors?
  2. Or does the best way depend entirely on the task in question? If so, what strategy is best for a word-level language model?
  3. Why combine the vectors at all? Why not use the "original" word embeddings for each word, i.e. those contained in the weight matrix between input and hidden neurons.

Related (but unanswered) questions:

Thrush answered 23/10, 2017 at 12:44 Comment(5)
You might want to add what you are trying to do, e.g. build a sentence or paragraph level vector. (Gensim for example offers doc2vec for that)Bourn
I want to initialize my recurrent language model with the word embeddings produced by gensim. So my goal is to learn an embedding for each word in my vocabulary. After training the word2vec model, I can use the original embeddings or modify them further (as outlined in the post). I want to know which strategy yields the "best" word embeddingsThrush
In the first post you linked, the question is about creating a sentence vector. i.e. combine the word vectors to a single vector representing the sentence (or paragraph). That is where the question about how to combine the vectors seems to be most relevant. Is that what you want to do?Bourn
Not sure whether I understand your question. I am building a language model that is fed with sequential words and trained to predict the next word in a sentence. Each input word is mapped to an embedding. I use gensim to learn these word embeddings. My goal is to get the best possible word embeddings.Thrush
Okay, then it doesn't sound like you are trying to do that. As far as I know, the combination of vectors you referred to are used to create a single vector out of a number of vectors. Not to improve the word vectors themselves. But perhaps someone else knows better. To get better vectors you could obviously look into the training data, size of the embedding or alternative methods such as GloVe. Also including the type of word within sentence could potentially improve the vector (see Sense2Vec).Bourn
T
8

I have found an answer in the Stanford lecture "Deep Learning for Natural Language Processing" (Lecture 2, March 2016). It's available here. In minute 46 Richard Socher states that the common way is to average the two word vectors.

Thrush answered 18/1, 2018 at 11:51 Comment(1)
He does say "average or concatinate". In this context is "concatinate" a synonym for "average"? Or does he mean that one can choose either the "mean" or the "sum" of the two vectors?Crinite
A
2

You should read this research work at-least once to get the whole idea of combining word embeddings using different algebraic operators. It was my research.

In this paper you can also see the other methods to combine word vectors.

In short L1-Normalized average word vectors and sum of words are good representations.

Alsatia answered 30/10, 2018 at 18:4 Comment(3)
First of all please state you're the primary author (i.e. conflict of interest). Secondly, would be useful to summarize the relevant parts rather than just linking to your paper here.Pickup
Actually the work he is interested is in the research paper. And I have explained that. No reason to downvote this answer. It's related to the post.Alsatia
Nice paper but the question is about combining two vectors from word2vec for one particular word but not combining word vectors of a given sentence.Joashus
G
0

I don't know any work that empirically tests different ways of combining the two vectors, but there is a highly influencial paper comparing: 1) just use the word vector, and 2) adding up word and context vector. The paper is here: https://www.aclweb.org/anthology/Q15-1016/.

First, note that the metric is analogy and similarity tests, NOT downstream tasks.

Here is a quote from the paper:

for both SGNS and GloVe, it is worthwhile to experiment with the w + c variant [adding up word and context vectors], which is cheap to apply (does not require retraining) and can result in substantial gains (as well as substantial losses).

So I guess you just need to try it out on your specific task.

By the way, here is a post on how to get context vectors from gensim: link

Galilee answered 10/4, 2020 at 2:42 Comment(0)
B
-1

I thought I attempt to answer based on the comments.

The question you are linking to is: "WordVectors How to concatenate word vectors to form sentence vector"

Word vectors can be compared on its own. But often one wants to put the sentence, paragraph or a document in context - i.e. a collection of words. And then the question arises how to combine those to a single vector (gensim provides doc2vec for that use case).

That doesn't seem to be applicable in your case and I would just work with the given word vectors. You can adjust parameters like the size of the embedding, the training data, other algorithms. You could even combine vectors from different algorithms to create a kind of 'ensemble vector' (e.g. word2vec with GloVe). But it may not be more efficient.

Sometimes in language the same word has a different meaning depending on the type of word within a sentence or a combination of words. e.g. 'game' has a different meaning to 'fair game'. Sense2Vec offers a proposal to generate word vectors for those compound words: https://explosion.ai/blog/sense2vec-with-spacy (Of course, in that case you already need something that understands the sentence structure, such as SpaCy)

Bourn answered 23/10, 2017 at 14:8 Comment(3)
I think you have misunderstood my question. The word2vec model holds two word vectors for each word - one from each weight matrix. My question is related to why and how to combine these two vectors for individual words. I know about other techniques for creating word vectors and/or how to tweak the word2vec model. But my question is specifically related to word2vec and its outputs matrices.Thrush
I may very well have and I can later delete this answer. You mentioned predicting the next word before but I probably missed anything relating to your model holding vectors of two words. I outlined reasons for combining vectors to create a sentence vector. But your use case seems to be different. It might be worth expanding your question a bit more to explain your use case in more detail?Bourn
My model isn't holding two vectors for each word. The word2vec model is!Thrush

© 2022 - 2024 — McMap. All rights reserved.