The dbow_words
parameter only has effect when training a DBOW model –
that is, with the non-default dm=0
parameter.
So, between your two example lines of code, which both leave the default dm=1
value unchanged, there's no difference.
If you instead switch to DBOW training, dm=0
, then with a default dbow_words=0
setting, the model is pure PV-DBOW as described in the original 'Paragraph Vectors' paper. Doc-vectors are trained to be predictive of text example words, but no word-vectors are trained. (There'll still be some randomly-initialized word-vectors in the model, but they're not used or improved during training.) This mode is fast and still works pretty well.
If you add the dbow_words=1
setting, then skip-gram word-vector training will be added to the training, in an interleaved fashion. (For each text example, both doc-vectors over the whole text, then word-vectors over each sliding context window, will be trained.) Since this adds more training examples, as a function of the window
parameter, it will be significantly slower. (For example, with window=5
, adding word-training will make training about 5x slower.)
This has the benefit of placing both the DBOW doc-vectors and the word-vectors into the "same space" - perhaps making the doc-vectors more interpretable by their closeness to words.
This mixed training might serve as a sort of corpus-expansion – turning each context-window into a mini-document – that helps improve the expressiveness of the resulting doc-vector embeddings. (Though, especially with sufficiently large and diverse document sets, it may be worth comparing against pure-DBOW with more passes.)