How does gensim calculate doc2vec paragraph vectors

Asked 4/11, 2016 at 1:18 Answered 19/1, 2017 at 3:42

nlp vectorization gensim word2vec doc2vec

i am going thorugh this paper http://cs.stanford.edu/~quocle/paragraph_vector.pdf

and it states that

" Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use concatenation as the method to combine the vectors."

How does concatenation or averaging work?

example (if paragraph 1 contain word1 and word2):

word1 vector =[0.1,0.2,0.3]
word2 vector =[0.4,0.5,0.6]

concat method 
does paragraph vector = [0.1+0.4,0.2+0.5,0.3+0.6] ?

Average method 
does paragraph vector = [(0.1+0.4)/2,(0.2+0.5)/2,(0.3+0.6)/2] ?

Also from this image:

It is stated that :

The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context – or the topic of the paragraph. For this reason, we often call this model the Distributed Memory Model of Paragraph Vectors (PV-DM).

Is the paragraph token equal to the paragraph vector which is equal to on?

Wafd answered 4/11, 2016 at 1:18 Comment(0)

How does concatenation or averaging work?

You got it right for the average. The concatenation is: [0.1,0.2,0.3,0.4,0.5,0.6].

Is the paragraph token equal to the paragraph vector which is equal to on?

The "paragraph token" is mapped to a vector that is called "paragraph vector". It is different from the token "on", and different from the word vector that the token "on" is mapped to.

Toper answered 4/12, 2016 at 5:20 Comment(0)

A simple (and sometimes useful) vector for a range of text is the sum or average of the text's words' vectors – but that's not what the 'Paragraph Vector' of the 'Paragraph Vectors' paper is.

Rather, the Paragraph Vector is another vector, trained similarly to the word vectors, which is also adjusted to help in word-prediction. These vectors are combined (or interleaved) with the word vectors to feed the prediction model. That is, the averaging (in DM mode) includes the PV alongside word-vectors - it doesn't compose the PV from word-vectors.

In the diagram, on is the target-word being predicted, in that diagram by a combination of closely-neighboring words and the full-example's PV, which may perhaps be informally thought of as a special pseudoword, ranging over the entire text example, participating in all the sliding 'windows' of real words.

Barbusse answered 19/1, 2017 at 3:42 Comment(0)

Recommended topics

Hot tags