I'm interested in using tf-idf with FastText library, but have found a logical way to handle the ngrams. I have used tf-idf with SpaCy vectors already for what I have found several examples like these ones:
But for FastText library is not that clear to me, since it has a granularity that isn't that intuitive, E.G.
For a general word2vec aproach I will have one vector for each word, I can count the term frequency of that vector and divide its value accordingly.
But for fastText same word will have several n-grams,
"Listen to the latest news summary" will have n-grams generated by a sliding windows like:
lis ist ste ten tot het...
These n-grams are handled internally by the model so when I try:
model["Listen to the latest news summary"]
I get the final vector directly, hence what I have though is to split the text into n-grams before feeding the model like:
model['lis']
model['ist']
model['ten']
And make the tf-idf from there, but that seems like an inefficient approach both, is there a standar way to apply tf-idf to vector n-grams like these.