Fasttext algorithm use only word and subword? or sentences too?

Asked 13/4, 2018 at 7:22 Answered 14/4, 2018 at 3:22

Solved nlp vectorization word2vec word-embedding fasttext

I read the paper and googled as well if there is any good example of the learning method(or more likely learning procedure)

For word2vec, suppose there is corpus sentence

I go to school with lunch box that my mother wrapped every morning

Then with window size 2, it will try to obtain the vector for 'school' by using surrounding words

['go', 'to', 'with', 'lunch']

Now, FastText says that it uses the subword to obtain the vector, so it is definitely use n gram subword, for example with n=3,

['sc', 'sch', 'cho', 'hoo', 'ool', 'school']

Up to here, I understood. But it is not clear that if the other words are being used for learning for 'school'. I can only guess that other surrounding words are used as well like the word2vec, since the paper mentions

=> the terms Wc and Wt are both used in functions

where Wc is context word and Wt is word at sequence t.

However, it is not clear that how FastText learns the vectors for word.

Please clearly explain how FastText learning process goes in procedure?

More precisely I want to know that if FastText also follows the same procedure as Word2Vec while it learns the n-gram characterized subword in addition. Or only n-gram characterized subword with word being used?

How does it vectorize the subword at initial? etc

Sharpeyed answered 13/4, 2018 at 7:22 Comment(0)

Any context word has its candidate input vector assembled from the combination of both its full-word token and all its character-n-grams. So if the context word is 'school', and you're using 3-4 character n-grams, the in-training input vector is a combination of the full-word vector for school, and all the n-gram vectors for ['sch', 'cho', 'hoo', 'ool', 'scho', 'choo', 'hool'].)

When that candidate vector is adjusted by training, all the constituent vectors are adjusted. (This is a little like how in word2vec CBOW, mode, all the words of the single average context input vector get adjusted together, when their ability to predict a single target output word is evaluated and improved.)

As a result, those n-grams that happen to be meaningful hints across many similar words – for example, common word-roots or prefixes/suffixes – get positioned where they confer that meaning. (Other n-grams may remain mostly low-magnitude noise, because there's little meaningful pattern to where they appear.)

After training, reported vectors for individual in-vocabulary words are also constructed by combining the full-word vector and all n-grams.

Then, when you also encounter an out-of-vocabulary word, to the extent it shares some or many n-grams with morphologically-similar in-training words, it will get a similar calculated vector – and thus be better than nothing, in guessing what that word's vector should be. (And in the case of small typos or slight variants of known words, the synthesized vector may be pretty good.)

Poitiers answered 14/4, 2018 at 3:22 Comment(3)

you again. and you answered me what I want to know about. Thanks for your explanation. I'll ask more if I have further questions. – Sharpeyed 14/4, 2018 at 6:53

By briefly explain, you are saying word - subword train separately from word - word train (which is word2vec) and combine the vectors together. Did I understand correctly? Or are they train at once in same vector spaces, just using two different methods altogether when it is being trained? – Sharpeyed 14/4, 2018 at 6:55

To calculate the vector for any single word, FastText models look into its known items for both the entire exact word, and all its subwords (character n-grams)– and combine the two to make the effective vector for that word. During training, that means the entire-word, and all its n-grams, are contributing to the NN forward-propagation prediction step for every training example. Then when the backpropagated correction is applied, it affects the entire-word and the subwords. So nothing it 'trained separately'. – Poitiers 14/4, 2018 at 18:25

The fastText site states that at least 2 of implemented algorithms do use surrounding words in sentences.

Moreover, the original fastText implementation is open source so you can check how exactly it works exploring the code.

Newmodel answered 13/4, 2018 at 19:6 Comment(0)

Recommended topics

Hot tags