What is different between doc2vec models when the dbow_words is set to 1 or 0?
Asked Answered
L

1

7

I read this page but I do not understand what is different between models which are built based on the following codes. I know when dbow_words is 0, training of doc-vectors is faster.

First model

model = doc2vec.Doc2Vec(documents1, size = 100, window = 300, min_count = 10, workers=4)

Second model

model = doc2vec.Doc2Vec(documents1, size = 100, window = 300, min_count = 10, workers=4,dbow_words=1)
Lachlan answered 16/5, 2017 at 21:15 Comment(0)
W
12

The dbow_words parameter only has effect when training a DBOW model – that is, with the non-default dm=0 parameter.

So, between your two example lines of code, which both leave the default dm=1 value unchanged, there's no difference.

If you instead switch to DBOW training, dm=0, then with a default dbow_words=0 setting, the model is pure PV-DBOW as described in the original 'Paragraph Vectors' paper. Doc-vectors are trained to be predictive of text example words, but no word-vectors are trained. (There'll still be some randomly-initialized word-vectors in the model, but they're not used or improved during training.) This mode is fast and still works pretty well.

If you add the dbow_words=1 setting, then skip-gram word-vector training will be added to the training, in an interleaved fashion. (For each text example, both doc-vectors over the whole text, then word-vectors over each sliding context window, will be trained.) Since this adds more training examples, as a function of the window parameter, it will be significantly slower. (For example, with window=5, adding word-training will make training about 5x slower.)

This has the benefit of placing both the DBOW doc-vectors and the word-vectors into the "same space" - perhaps making the doc-vectors more interpretable by their closeness to words.

This mixed training might serve as a sort of corpus-expansion – turning each context-window into a mini-document – that helps improve the expressiveness of the resulting doc-vector embeddings. (Though, especially with sufficiently large and diverse document sets, it may be worth comparing against pure-DBOW with more passes.)

Wakeless answered 17/5, 2017 at 1:7 Comment(6)
@goiomo You wrote ''This has the benefit of placing both the DBOW doc-vectors and the word-vectors into the "same space". Does it mean in other method of building doc2vec model the word2vec vectors and doc2vect vectors are not in the same space?Lachlan
In PV-DBOW (dm=0) without dbow_words=1, the word-vectors aren't trained - remaining random. In PV-DM (dm=1), the doc-vectors and word-vectors are averaged together, so they are again in the "same space" for comparability. In the advanced/experimental dm_concat=1 mode (added to dm=1 & not recommended) doc-vectors & word-vectors are input to the prediction-neural-network in separate places and so may not be comparable – essentially coming from different spaces.Wakeless
@Wakeless what is the benefit of having both docvecs and wv in the same model. Should't we use Word2Vec and Doc2Vec separately?Anniceannie
Some Doc2Vec modes inherently make word-vectors concurrently with the doc-vectors. (And in such cases, the gensim implementation shares a lot of code.) And no mode of Paragraph-Vectors Doc2Vec requires word-vectors as an input at the beginning. (The 'Paragraph Vector' algorithm is not a two-stage process that does word-vectors 1st, then doc-vectors. If it uses word-vectors at all, they are co-trained from the beginning with doc-vectors.)Wakeless
So, if you only need word-vectors, sure, just use Word2Vec. If you only need doc-vectors, use Doc2Vec in a mode that doesn't create or word-vectors (pure PV-DBOW, dm=0, dbow_words=1) or a Doc2Vec mode that also happens to create word-vectors but just choose to ignore them. If you need both from the same data, use a Doc2Vec mode that also creates word-vectors (like PV-DM dm=1 or PV-DBOW-with-interleaved-skip-gram-word-training, dm=0, dbow_words=1). If you need both but do it in two separate steps, you'll spend more time training, and the vectors won't be inherently compatible.Wakeless
@Wakeless In DBOW (dm=0) mode with dbow_words=0 (pure PV-DBOW), I agree that there is no input matrix of word vectors (it is replaced by the document vector matrix). But, isn't the output matrix still trained? This matrix could be used as word vectors, no? Or am I missing sth?Bezonian

© 2022 - 2024 — McMap. All rights reserved.