In spacy, how to use your own word2vec model created in gensim?
Asked Answered
G

3

19

I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how.

gensimmodel
Out[252]:
<gensim.models.word2vec.Word2Vec at 0x110b24b70>

import spacy
spacy.load(gensimmodel)

OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Gorgerin answered 22/5, 2018 at 11:32 Comment(1)
The binary solution has been answered here: https://mcmap.net/q/560327/-spacy-how-to-load-google-news-word2vec-vectorsVrablik
D
26

Train and save your model in plain-text format:

from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec

path = get_tmpfile("./data/word2vec.model")

model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("./data/word2vec.txt")

Gzip the text file:

gzip word2vec.txt

Which produces a word2vec.txt.gz file.

Run the following command:

python -m spacy init-model en ./data/spacy.word2vec.model --vectors-loc word2vec.txt.gz

Load the vectors using:

nlp = spacy.load('./data/spacy.word2vec.model/')
Dance answered 7/11, 2018 at 16:41 Comment(5)
The last command didn't work for me, since spacy interpreted the 'en' parameter as filepath. What worked was simply running nlp = spacy.load('./data/spacy.word2vec.model/') as suggested in spacy docsAlgernon
The bridge that solved my problem was the line model.wv.save_word2vec_format("./data/word2vec.txt")Panda
doesn't work for me! I follow your steps but get the follwoing error when runnign the python -m spacy ... command: FileNotFoundError: [Errno 2] No such file or directory: 'data/spacy.word2vec.model'Wendell
Fixed it by changing the path here: w2v_model.wv.save_word2vec_format("word2vec.txt", binary=False) and by adjusting the spacy command to reflect the change in path: python3 -m spacy init-model en spacy.word2vec.model --vectors-loc word2vec.txt.gz. I then read in the standard model along with my new vectors: nlp = spacy.load('en_core_web_sm', vectors='spacy.word2vec.model')Wendell
the 'init-model' flag was changed to init, see doc spacy.io/api/cli#init-modelEquivocation
H
4

As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using:

wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init-model en your_model --vectors-loc cc.la.300.vec.gz

then you can load you model, nlp = spacy.load('your_model') and use it!

Also see the similar question that answered here.

Holst answered 29/5, 2018 at 4:41 Comment(1)
The question is rather: how do you save a gensim model so it's readable by spaCyPiaffe
P
3

All of these answers are for an older version of spacy. In the latest version the command is changed to:

python -m spacy init vectors [OPTIONS] LANG VECTORS_LOC OUTPUT_DIR

you can learn more about options by typing python -m spacy init --help in your command prompt

Peluso answered 28/9, 2021 at 11:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.