I am using the TfidfTransformer from the sklearn package in Python 2.7.
As I was getting comfortable with the arguments, I became a bit confused about use_idf
, as in:
TfidfVectorizer(use_idf=False).fit_transform(<corpus goes here>)
What exactly does use_idf
do when false or true?
Since we are generating a sparse Tfidf matrix, it doesn't make sense to have an argument to choose a sparse Tfidif matrix; that seems redundant.
This post was interesting but didn't seem to nail it.
The documentation says only, Enable inverse-document-frequency reweighting
, which isn't very illuminating.
Any comments appreciated.
EDIT
I think I figured it out. It's real simple:
Text --> counts
Counts --> TF, meaning we just have raw counts
or
Counts --> TFIDF, meaning we have weighted counts.
What was confusing me was...since they called it TfidfVectorizer
I didn't realize that was true only if you chose it to be a TFIDF. You could have also use it to create just a TF.