How to export a fasttext model created by gensim, to a binary file? - McMap

About

How to export a fasttext model created by gensim, to a binary file?

Asked 15/11, 2019 at 12:1 Answered 15/11, 2019 at 21:42

Solved python nlp gensim fasttext

U

1

5

I'm trying to export the fasttext model created by gensim to a binary file. But the docs are unclear about how to achieve this. What I've done so far:

model.wv.save_word2vec_format('model.bin')

But this does not seems like the best solution. Since later when I want to load the model using the :

fasttext.load_facebook_model('model.bin')

I get into an infinite loop. While loading the fasttext.model created by model.save('fasttext.model) function gets completed in around 30 seconds.

Unawares answered 15/11, 2019 at 12:1 Comment(2)

read this: radimrehurek.com/gensim/models/… did you try model.save? – Lupelupee 15/11, 2019 at 13:55

@Anakin87 You cannot save the model as a .bin file using the model.save method. – Unawares 15/11, 2019 at 14:5

O

6

Using .save_word2vec_format() saves just the full-word vectors, to a simple format that was used by Google's original word2vec.c release. It doesn't save unique things about a full FastText model. Such files would be reloaded with the matched .load_word2vec_format().

The .load_facebook_format() method loads files in the format saved by Facebook's original (non-Python) FastText code release. (The name of this method is pretty misguided, since 'facebook' could mean so many different things other than a specific data format.) Gensim doesn't have a matched method for saving to this same format – though it probably wouldn't be very hard to implement, and would make symmetric sense to support this export option.

Gensim's models typically implement gensim-native .save() and .load() options, which make use of a mix of Python 'pickle' serialization and raw large-array files. These are your best options if you want to save the full model state, for later reloading back into Gensim.

(Such files can't be loaded by other FastText implementations.)

Be sure to keep the multiple related files written by this .save() (all with the same user-supplied prefix) together when moving the saved model to a new location.

Update (May 2020): Recent versions of gensim such as 3.8.3 and later include a new contributed FastText.save_facebook_model() method which saves to the original Facebook FastTExt binary format.

Orji answered 15/11, 2019 at 21:42 Comment(6)

Thanks for your explanation. So what should I do if I wanted to make my model file as lightweight as possible and get rid of trainable weights? – Unawares 15/11, 2019 at 22:35

There's not yet a built-in option supporting that (though if there was a matching .save_facebook_format() it would fit the bill). You could try nulling-out some of the full model's no-longer-needed properties before doing a .save(), though there'd be some risk that results in a model that doesn't do some things it still should, or errors on re-load (so experiment carefully). Discarding the model.trainables.syn1neg might work & offer the biggest savings – unsure if other parts of trainables could be discarded. – Orji 15/11, 2019 at 23:46

can't we save the model as a pickle file & load it whenever needed? – Surtax 15/5, 2020 at 17:24

Pickle will break for models over a few GB in size - which are quite common. And the separate arrays saved by .save() will load more quickly/efficiently, and give the option of read-only memory-mapped loading – which can offer memory savings in some multi-process deployment scenarios. But you could try it for small models! Also: note my update at the bottom of the answer: recently gensim added a FastText.save_facebook_model() option. – Orji 15/5, 2020 at 20:52

I am trying to use the save_facebook_model but I get an error. I cannot find the documentation in gensims web page. Do you know where to find it? – Pristine 10/6, 2020 at 17:55

It appears the online docs at radimrehurek.com/gensim/models/fasttext.html haven't yet been regenerated to show this new option. Until they are, you can read about it directly from the source code, at: github.com/RaRe-Technologies/gensim/blob/… – Orji 10/6, 2020 at 18:37

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.