Gensim 3.8.0 to Gensim 4.0.0
Asked Answered
V

4

26

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code:

model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.wv.vocab.keys()
self.word2vec = {word:model.wv[word]%EMBEDDING_DIM for word in words}

I was getting error that "model.mv" has been removed from Gensim 4.0.0. Then I used the following code:

model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.vocab.keys()
word2vec = {word:model[word]%EMBEDDING_DIM for word in words}

And getting the following error:

AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

Can anyone please suggest that how can I use the pretrained model & return a dictionary in Gensim 4.0.0?

Villous answered 30/3, 2021 at 9:28 Comment(0)
U
32

The changes caused by the migration from Gensim 3.x to 4 are all present in the github link:

https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

For the above problem, the solution that worked for me:

    words = list(model.wv.index_to_key)
Uveitis answered 9/5, 2021 at 6:8 Comment(0)
D
9

The migration notes explain major changes & how to adapt your code:

https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

Per the guidance there, to just get a list of the words, since your model variable is already an instance of KeyedVectors, you can use:

model.index_to_key

Your code doesn't show a need for a dict, but there is a slightly-different word-to-index-position dict in model.key_to_index. However, you can just use model[key] like before to get individual vectors.

(Separately: I can't imagine your %EMBEDDING_DIM is doing anything useful. Why would you want to perform an elementwise % modulus operation, using the integer count of dimensions, against individual dimensions that are often small floating-point numbers? It'll often be harmless, as the EMBEDDING_DIM will usually be far larger than the individual values, but it doesn't serve any good purpose.)

Dosser answered 31/3, 2021 at 0:28 Comment(0)
G
3

On gensim 4.0.0 you will need to use the key_to_index method from the KeyedVector of your model, that will return you a dict_keys object with all the words -keys- on the model so you can still iterate through all your vocabulary :).

Your code should be now like this:

model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = list(model.wv.key_to_index.keys())
self.word2vec = {word:model.wv[word]%EMBEDDING_DIM for word in words}
Gymno answered 25/8, 2021 at 22:58 Comment(0)
C
1
word_vocab= model.wv.vocab

I tried this but due to updating of gensim it's provide error like this:-

The vocab attribute was removed from KeyedVector in Gensim 4.0.0. Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.

Thus, I tried kew_to_index.keys() with list

word_vocab= list(model.wv.key_to_index.keys())

and it's work well

Cimmerian answered 9/12, 2023 at 9:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.