where can i download a pretrained word2vec map?
Asked Answered
O

1

8

I have been learning about NLP models and came across word embedding, and saw the examples in which it is possible to see relations between words by calculating their dot products and such.

What I am looking for is just a dictionary, mapping words to their representative vectors, so I can play around with it. I know that I can build a model and train it and create my own map but I just want the already trained map as a python variable.

Obduliaobdurate answered 4/1, 2020 at 13:10 Comment(1)
Thanks for the answerer and the commentators, I finally found exactly what I was looking for here: web.stanford.edu/class/cs224n/materials/…Obduliaobdurate
D
8

You can try out Google's word2vec model trained with about 100 billion words from various news articles.

An interesting fact about word vectors, w2v(king) - w2v(man) + w2v(woman) ≈ w2v(queen)

Denote answered 4/1, 2020 at 13:21 Comment(2)
And you can specifically load that file with library like gensim that supports word-vectors, using its KeyedVectors.load_word2vec_format() method: radimrehurek.com/gensim/models/keyedvectors.html – the KeyedVectors object will behave like a Python dict, though it's not literally a dict for performance reasons.Location
If that GoogleNews set of 3 million words/short-phrases is too large to be convenient to work with – as it takes ~3+GB of RAM to load, and more GB to do most_similar() operations – you can load a subset using the limit parameter. EG: goog_wordvecs = KeyedVectors.load_word2vec_format(' GoogleNews-vectors-negative300.bin', binary=True, limit=100000) to load just the 1st 100,000 words – less than 4% of all its words, but still enough to cover most common words.Location

© 2022 - 2024 — McMap. All rights reserved.