How to load a pre-trained Word2vec MODEL File and reuse it?

Asked 17/9, 2016 at 16:40 Answered 15/12, 2023 at 11:18

Solved python file model word2vec gensim

I want to use a pre-trained word2vec model, but I don't know how to load it in python.

This file is a MODEL file (703 MB). It can be downloaded here:
http://devmount.github.io/GermanWordEmbeddings/

Skyway answered 17/9, 2016 at 16:40 Comment(0)

just for loading

import gensim

# Load pre-trained Word2Vec model.
model = gensim.models.Word2Vec.load("modelName.model")

now you can train the model as usual. also, if you want to be able to save it and retrain it multiple times, here's what you should do

model.train(//insert proper parameters here//)
"""
If you don't plan to train the model any further, calling
init_sims will make the model much more memory-efficient
If `replace` is set, forget the original vectors and only keep the normalized
ones = saves lots of memory!
replace=True if you want to reuse the model
"""
model.init_sims(replace=True)

# save the model for later use
# for loading, call Word2Vec.load()

model.save("modelName.model")

Botts answered 23/9, 2016 at 14:2 Comment(4)

I get this error: File "C:\...\Python\Python35\lib\site-packages\gensim\utils.py", line 911, in unpickle return _pickle.loads(f.read()) _pickle.UnpicklingError: invalid load key, '6'. – Skyway 1/10, 2016 at 16:47

_pickle.UnpicklingError: invalid load key, '3'. Looks like in some cases .load_word2vec_format() can help. – Atabrine 25/9, 2017 at 21:52

gensim.models.KeyedVectors.load_word2vec_format works fine – Dysgenics 16/7, 2019 at 15:38

load_word2vec_format() and gensim.models.KeyedVectors.load_word2vec_format() gave either an unpickling error or UnicodeDecodeError. What solved it for me was adding the binary = True argument. See post below. So this is what worked for me: model = KeyedVectors.load_word2vec_format('german.model', binary = True) where the model is saved in the previously defined working directory – Encyclopedia 7/2, 2024 at 9:15

Use KeyedVectors to load the pre-trained model.

from gensim.models import KeyedVectors

word2vec_path = 'path/GoogleNews-vectors-negative300.bin.gz'
w2v_model = KeyedVectors.load_word2vec_format(word2vec_path, binary=True)

Kerr answered 5/10, 2021 at 9:33 Comment(0)

I used the same model in my code and since I couldn't load it, I asked the author about it. His answer was that the model has to be loaded in binary format:

gensim.models.KeyedVectors.load_word2vec_format(w2v_path, binary=True)

This worked for me, and I think it should work for you, too.

Cheung answered 25/4, 2022 at 9:18 Comment(1)

Reaches out to developer, gets help and then post the solution here. Bravo sir, thanks a lot! – Recha 21/11, 2022 at 15:57

I met the same issue and I downloaded GoogleNews-vectors-negative300 from Kaggle. I saved and extracted the file in my descktop. Then I implemented this code in python and it worked well:

model = KeyedVectors.load_word2vec_format=(r'C:/Users/juana/descktop/archive/GoogleNews-vectors-negative300.bin')

Jabberwocky answered 25/7, 2022 at 21:23 Comment(0)

Since you specifically mentioned the German Word2Vec, here's an up-to-date example, with GenSim 4.3.0:

from gensim.models import KeyedVectors

# NOTE: 'german.model' is available at https://devmount.github.io/GermanWordEmbeddings/
word2vec_path = 'german.model'
model = KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
model.most_similar(model['Frau'] + model['Kind'])
# [('Kind', 0.8979102969169617), ('Frau', 0.8766001462936401),  ('Mutter', 0.8282196521759033), ...
model.most_similar(model['Obama'] - model['USA'] + model['Russland'])
# [('Obama', 0.8849074840545654), ('US-Praesident_Obama', 0.8133699893951416),  ('Putin', 0.7943856120109558), ...
model['Frankreich']
# array([-0.0014747 ,  0.09541887,  0.10959213,  0.12412726,  0.06772646, ....
model['NotAWord']
# KeyError

Fingerboard answered 15/12, 2023 at 11:18 Comment(0)

Recommended topics

Hot tags