scikits learn and nltk: Naive Bayes classifier performance highly different

Asked 2/5, 2012 at 3:19 Answered 12/5, 2014 at 3:43

python machine-learning nltk scikits scikit-learn

I am comparing two Naive Bayes classifiers: one from NLTK and and one from scikit-learn. I'm dealing with a multi-class classification problem (3 classes: positive (1), negative (-1), and neutral (0)).

Without performing any feature selection (that is, using all features available), and using a training dataset of 70,000 instances (noisy-labeled, with an instance distribution of 17% positive, 4% negative and 78% neutral), I train two classifiers, the first one is a nltk.NaiveBayesClassifier, and the second one is a sklearn.naive_bayes.MultinomialNB (with fit_prior=True).

After training, I evaluated the classifiers on my test set of 30,000 instances and I get the following results:

**NLTK's NaiveBayes**
accuracy: 0.568740
class: 1
     precision: 0.331229
     recall: 0.331565
     F-Measure: 0.331355
class: -1
     precision: 0.079253 
     recall: 0.446331 
     F-Measure: 0.134596 
class: 0
     precision: 0.849842 
     recall: 0.628126 
     F-Measure: 0.722347 


**Scikit's MultinomialNB (with fit_prior=True)**
accuracy: 0.834670
class: 1
     precision: 0.400247
     recall: 0.125359
     F-Measure: 0.190917
class: -1
     precision: 0.330836
     recall: 0.012441
     F-Measure: 0.023939
class: 0
     precision: 0.852997
     recall: 0.973406
     F-Measure: 0.909191

**Scikit's MultinomialNB (with fit_prior=False)**
accuracy: 0.834680
class: 1
     precision: 0.400380
     recall: 0.125361
     F-Measure: 0.190934
class: -1
     precision: 0.330836
     recall: 0.012441
     F-Measure: 0.023939
class: 0
     precision: 0.852998
     recall: 0.973418
     F-Measure: 0.909197

I have noticed that while Scikit's classifier has better overall accuracy and precision, its recall is very low compared to the NLTK one, at least with my data. Taking into account that they might be (almost) the same classifiers, isn't this strange?

Galliwasp answered 2/5, 2012 at 3:19 Comment(3)

What are the features? Did you try a BernoulliNB as well? That should be closer to the NLTK Naive Bayes. – Kendakendal 2/5, 2012 at 20:24

Thanks for the reply. The features are words with value 1 if they exist in the document (boolean). The results for scikits BernoulliNB are very close to MultinomialNB:

accuracy: 0.834680 class: 1 	 precision: 0.400380 	 recall: 0.125361 	 F-Measure: 0.190934 class: -1 	 precision: 0.330836 	 recall: 0.012441 	 F-Measure: 0.023939 class: 0 	 precision: 0.852998 	 recall: 0.973418 	 F-Measure: 0.909197

– Galliwasp 3/5, 2012 at 0:47

The only thing I can see in the documentation is that NLTK's NB classifier apparently doesn't do smoothing. I wouldn't expect that to cause a big difference, though... – Kendakendal 3/5, 2012 at 10:24

Naive Bayes classifier usually means a Bayesian classfier over binary features that are assumed to be independent. This is what NLTK's Naive Bayes classifier implements. The corresponding scikit classifier is BernoulliNB classifier.

The restriction to boolean valued features is not actually necessary, it is just the simplest to implement. A naive Bayes classifier can be defined for (assumed) independent features from any parametric distribution.

MultinomialNB is for data with integer valued input features that are assumed to be multinomially distributed.

Sckit also has GaussianNB that for continuous valued features that are assumed to idependently Gaussian distributed.

Gertrudis answered 12/5, 2014 at 3:43 Comment(0)

Is the default behavior for class weights the same in both libraries? The difference in precision for the rare class (-1) looks like that might be the cause...

Manns answered 6/5, 2012 at 3:25 Comment(3)

Naive Bayes in NLTK takes into the prior label probability, and (I think that) Scikits does the same when using the with fit_prior=True parameter... – Galliwasp 6/5, 2012 at 14:55

My (possibly false) understanding is that fit_prior=True will use the actual class weights so that, e.g., assigning all the negative examples (4% of sample) to the neutral class will only result in an accuracy hit of -4% (which is what it appears to be doing). Try running it with fit_prior=False. – Manns 7/5, 2012 at 18:1

Thanks. I tried running it with fit_prior=False and surprisingly it gives almost the same results (I updated the main post) – Galliwasp 8/5, 2012 at 1:59

Recommended topics

Hot tags