libSVM different result with/without probabilities

Asked 17/4, 2015 at 19:56 Answered 22/4, 2015 at 16:9

I was wondering why libSVM gives different accuracy results if I predict with or without the probabilities and I found a FAQ at this page which says

Q: Why using svm-predict -b 0 and -b 1 gives different accuracy values?

Let's just consider two-class classification here. After
probability information is obtained in training, we do not have
prob > = 0.5 if and only if decision value >= 0.
So predictions may be different with -b 0 and 1.

I read and re-read it a dozen times but still do not understand it. Can someone explain it more clearly?

Marston answered 17/4, 2015 at 19:56 Comment(0)

A "normal" SVM model calculates a decision value for each given data point, which basically is the distance of said point from the separating hyperplane. Everything on the one side of the hyperplane (dec_value >= 0) is predicted as class A, everything on the other side (dec_value < 0) as class B.

If you now calculate class probabilities, there may be a point with a decision value of (for example) 0.1, which would make it class A. But the probability calculation for class A could be 45% and for class B 55%, so the algorithm would now predict it as B.

Possible algorithms for calculating said class probabilities are described in their paper, Section 8.

The sentence in question

After probability information is obtained in training, we do not have prob > = 0.5 if and only if decision value >= 0. So predictions may be different with -b 0 and 1.

Basically says "A decision value of >= 0 does not mean probA > probB or vice versa.

Physiologist answered 22/4, 2015 at 16:9 Comment(0)

I think this is because the probability are computed using cross-validation (at least in python but as it uses libSvm beyond the scene it might answer your question).

Moreover, in the documentation, they indicate that this cross validation operation may produce probability estimate inconsistent with the scores.

Needless to say, the cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities

Impotence answered 22/4, 2015 at 15:36 Comment(0)

Recommended topics

Hot tags