Is there pre-trained doc2vec model?

I

2

19

Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?

Impressible answered 2/7, 2018 at 9:25 Comment(1)

I just wanted to add a link to other pretrained gensim models: nilc.icmc.usp.br/embeddings – Caddoan 9/5, 2023 at 11:34

U

10

I don't know of any good one. There's one linked from this project, but:

it's based on a custom fork from an older gensim, so won't load in recent code
it's not clear what parameters or data it was trained with, and the associated paper may have made uninformed choices about the effects of parameters
it doesn't appear to be the right size to include actual doc-vectors for either Wikipedia articles (4-million-plus) or article paragraphs (tens-of-millions), or a significant number of word-vectors, so it's unclear what's been discarded

While it takes a long time and significant amount of working RAM, there is a Jupyter notebook demonstrating the creation of a Doc2Vec model from Wikipedia included in gensim:

https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb

So, I would recommend fixing the mistakes in your attempt. (And, if you succeed in creating a model, and want to document it for others, you could upload it somewhere for others to re-use.)

Uniseptate answered 10/7, 2018 at 3:48 Comment(2)

I know this is a very old answer but do you think it is possible to train a Doc2Vec model on Google colab? – Fire 13/6, 2021 at 14:29

I'm not a users of Google Colab, but if I understand correctly that it lets you run Python code, in a notebook, with enough RAM to do common ML tasks – sure why not? – Uniseptate 13/6, 2021 at 18:6

E

9

Yes! I could find two pre-trained doc2vec models at this link

but still could not find any pre-trained doc2vec model which is trained on tweets

Enisle answered 15/11, 2018 at 19:14 Comment(0)

Recommended topics

Hot tags