I'm using the ruby classifier gem whose classifications method returns the scores for a given string classified against the trained model.
Is the score a percentage? If so, is the maximum difference 100 points?
I'm using the ruby classifier gem whose classifications method returns the scores for a given string classified against the trained model.
Is the score a percentage? If so, is the maximum difference 100 points?
It's the logarithm of a probability. With a large trained set, the actual probabilities are very small numbers, so the logarithms are easier to compare. Theoretically, scores will range from infinitesimally close to zero down to negative infinity. 10**score * 100.0
will give you the actual probability, which indeed has a maximum difference of 100.
Actually to calculate the probability of a typical naive bayes classifier where b is the base, it is b^score/(1+b^score). This is the inverse logit (http://en.wikipedia.org/wiki/Logit) However, given the independence assumptions of the NBC, these scores tend to be too high or too low and probabilities calculated this way will accumulate at the boundaries. It is better to calculate the scores in a holdout set and do a logistic regression of accurate(1 or 0) on score to get a better feel for the relationship between score and probability.
From a Jason Rennie paper: 2.7 Naive Bayes Outputs Are Often Overcondent Text databases frequently have 10,000 to 100,000 distinct vocabulary words; documents often contain 100 or more terms. Hence, there is great opportunity for duplication. To get a sense of how much duplication there is, we trained a MAP Naive Bayes model with 80% of the 20 Newsgroups documents. We produced p(cjd;D) (posterior) values on the remaining 20% of the data and show statistics on maxc p(cjd;D) in table 2.3. The values are highly overcondent. 60% of the test documents are assigned a posterior of 1 when rounded to 9 decimal digits. Unlike logistic regression, Naive Bayes is not optimized to produce reasonable probability values. Logistic regression performs joint optimization of the linear coecients, converging to the appropriate probability values with sucient training data. Naive Bayes optimizes the coecients one-by-one. It produces realistic outputs only when the independence assumption holds true. When the features include signicant duplicate information (as is usually the case with text), the posteriors provided by Naive Bayes are highly overcondent.
© 2022 - 2024 — McMap. All rights reserved.
classifications
method, and you're right on track. – Trapezium