How can I use TF-IDF vectorizer
from the scikit-learn library to extract unigrams
and bigrams
of tweets? I want to train a classifier with the output.
This is the code from scikit-learn:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)