How to find similar words with FastText?
Asked Answered
A

6

15

I am playing around with FastText, https://pypi.python.org/pypi/fasttext,which is quite similar to Word2Vec. Since it seems to be a pretty new library with not to many built in functions yet, I was wondering how to extract morphological similar words.

For eg: model.similar_word("dog") -> dogs. But there is no function built-in.

If I type model["dog"]

I only get the vector, that might be used to compare cosine similarity. model.cosine_similarity(model["dog"], model["dogs"]]).

Do I have to make some sort of loop and do cosine_similarity on all possible pairs in a text? That would take time ...!!!

Aerophyte answered 13/2, 2017 at 14:33 Comment(3)
When fasttext.skipgram('train.txt','model') is run, it creates a .bin & .vec file. Use these generated files and follow the process mentioned in the accepted answer.Aerometry
@Prometheus Any ideas how to do something similar in Java?Primero
Nope. Have never touched Java. However FYI, the .bin and .vec files are cross compatible.Aerometry
D
16

Use Gensim, load fastText trained .vec file with load.word2vec models and use most_similiar() method to find similar words!

Delighted answered 15/2, 2017 at 18:36 Comment(1)
Is their any API in fasttext that allows one to input two words and then returns their cosine similarity? Say something like (car,vehicle) and then returns something like 0.8?Gloriagloriana
B
11

You can install pyfasttext library to extract the most similar or nearest words to a particualr word.

from pyfasttext import FastText
model = FastText('model.bin')
model.nearest_neighbors('dog', k=2000)

Or you can get the latest development version of fasttext, you can install from the github repository :

import fasttext
model = fasttext.load_model('model.bin')
model.get_nearest_neighbors('dog', k=100)
Bloodthirsty answered 18/9, 2019 at 14:54 Comment(0)
S
7

You can install and import gensim library and then use gensim library to extract most similar words from the model that you downloaded from FastText.

Use this:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec')
similar = model.most_similar(positive=['man'],topn=10)

And by topn parameter you get the top 10 most similar words.

Scottiescottish answered 8/7, 2018 at 1:29 Comment(0)
W
6

You should use gensim to load the model.vec and then get similar words:

m = gensim.models.Word2Vec.load_word2vec_format('model.vec')
m.most_similar(...)
Wolgast answered 14/2, 2017 at 9:50 Comment(0)
J
3

Use gensim,

from gensim.models import FastText

model = FastText.load(PATH_TO_MODEL)
model.wv.most_similar(positive=['dog'])

More info here

Joeyjoffre answered 3/1, 2021 at 2:39 Comment(0)
A
2

Fasttext has a method called get_nearest_neighbors. nearest neighbor queries. One needs the model's .bin file to use this.

enter image description here

Ansley answered 7/4, 2022 at 10:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.