What does a Bayesian Classifier score represent?

Asked 4/2, 2011 at 3:4 Answered 22/2, 2011 at 18:5

I'm using the ruby classifier gem whose classifications method returns the scores for a given string classified against the trained model.

Is the score a percentage? If so, is the maximum difference 100 points?

Tympanic answered 4/2, 2011 at 3:4 Comment(0)

It's the logarithm of a probability. With a large trained set, the actual probabilities are very small numbers, so the logarithms are easier to compare. Theoretically, scores will range from infinitesimally close to zero down to negative infinity. 10**score * 100.0 will give you the actual probability, which indeed has a maximum difference of 100.

Inessa answered 4/2, 2011 at 4:15 Comment(4)

+1 I just checked the source for the classifications method, and you're right on track. – Trapezium 4/2, 2011 at 4:43

This makes sense, but I'm still struggling with the formula for actual probability. A typical score for my set is something like -8.84. So 10*(-8.84)*100 = 840. I'm still missing something. – Tympanic 4/2, 2011 at 8:57

It seems like you multiplied 10 by -8.84. You have to elevate 10 to the "-8.84"th power. – Parkway 21/4, 2011 at 15:42

The classifier does not give a probability, nor the logarithm of one. When calculating the score for every class, the denominator in the naive Bayes equation is dropped because it does not affect the result of a classification. This can also be seen in the source code of the Classifier gem here. It calculates a relative probability, not an absolute. – Deport 8/4, 2013 at 16:51

Actually to calculate the probability of a typical naive bayes classifier where b is the base, it is b^score/(1+b^score). This is the inverse logit (http://en.wikipedia.org/wiki/Logit) However, given the independence assumptions of the NBC, these scores tend to be too high or too low and probabilities calculated this way will accumulate at the boundaries. It is better to calculate the scores in a holdout set and do a logistic regression of accurate(1 or 0) on score to get a better feel for the relationship between score and probability.

From a Jason Rennie paper: 2.7 Naive Bayes Outputs Are Often Overcondent Text databases frequently have 10,000 to 100,000 distinct vocabulary words; documents often contain 100 or more terms. Hence, there is great opportunity for duplication. To get a sense of how much duplication there is, we trained a MAP Naive Bayes model with 80% of the 20 Newsgroups documents. We produced p(cjd;D) (posterior) values on the remaining 20% of the data and show statistics on maxc p(cjd;D) in table 2.3. The values are highly overcondent. 60% of the test documents are assigned a posterior of 1 when rounded to 9 decimal digits. Unlike logistic regression, Naive Bayes is not optimized to produce reasonable probability values. Logistic regression performs joint optimization of the linear coecients, converging to the appropriate probability values with sucient training data. Naive Bayes optimizes the coecients one-by-one. It produces realistic outputs only when the independence assumption holds true. When the features include signicant duplicate information (as is usually the case with text), the posteriors provided by Naive Bayes are highly overcondent.

Wow answered 22/2, 2011 at 18:5 Comment(0)

Recommended topics

Hot tags