Decision values in Libsvm

Asked 14/6, 2012 at 9:28 Answered 14/6, 2012 at 21:18

I'm new to SVM. I used Libsvm for Matlab, and after a prediction phase I've got a decision values array. From SVM theory: each test record z is assigned as positive if

f(z)=1

where f(z) is defined as

f(z)=sign(w*z+b)

So how can I relate the decision value from the array for an instance z with f(z)? Is the prediction based on decision value so: if dec_value>0 then z is positive otherwise z is negative?

Coheman answered 14/6, 2012 at 9:28 Comment(0)

Yes, you are correct, if f(z) is positive, then the instance belongs to class +1, if its negative it belongs to class -1. The value of f(z) is not interpretable.

While the function:

f(z) = sign(w*z+b)

looks like an equation for a hyperplane, it differs in that w is not a normal vector - its length is not 1, so the value of f(z) is not the distance from the hyperplane, this is why it is specified as sign(..), to make it clear the value is only used to determine which side of the hyperplane the instance falls on.

Some background:

The goal is to find the hyperplane which gives the maximum margin between the two classes:

enter image description here

So, the purpose is to maximize the margin, which is enter image description here , therefore minimizing . Remember, usually when w is used to represent a hyperplane as the normal vector, is 1. That isn't the case here clearly, as there would be no optimization problem. Instead of keeping = 1 and varying the width of the margin, we've fixed the width of the margin to 2 and are allowing to vary in size instead.

This gives us the primal optimization problem (with soft margins):

enter image description here

This seems to be what you are referring to. However, this equation comes from basic soft maximum margin classifier, which is foundation of SVM. True SVM is formulated as a Lagrangian dual to allow the use of kernels. The neat thing about SVM is that when the above problem (and its constraints) are formulated in the Lagrangian, all the variables except for the lagrangian multipliers drop out, leaving us with the following problem:

enter image description here

Notice there is no w. The training points x (y are the labels, 1 or -1), now only appear together as a dot product, allowing us to employ the kernel trick to obtain a non-linear model.

But if we don't have w what is our decision function? It becomes a function of our support vectors and the lagrangian multipliers we found.

enter image description here

This is what libsvm produces and what it stores as the model you have trained. It stores the support vectors and the associated alphas. For linear SVM, you can obtain the primal w, this is explained here in the LibSVM FAQ, but it is not going to be what you get back automatically from LibSVM, and this can only be done for the linear kernel.

The value of the SVM decision function based on the lagrangian multipliers and support vectors should only be interpreted by its sign as well.

Antidisestablishmentarianism answered 14/6, 2012 at 21:18 Comment(0)

Reading the docs tells me that:

The third [return value] is a matrix containing decision values or probability estimates (if '-b 1' is specified). If k is the number of classes in training data, for decision values, each row includes results of predicting k(k-1)/2 binary-class SVM's.

So for a two-class problems, what you get is a vector containing the decision values f(z) , so this means everything belonging to the first class has d<0 and everything belonging to the second class has d>0.

In general: libsvm considers its first class to be the first label it gets and so on. So for having reliable results you'd need to sort your data first.

In the binary case this holds too: whatever labels you feed svmtrain, it will take the first it encounters as class 1, and the second as class -1. This is trivially verifiable by feeding it a trivial dataset:

Y = [-ones(100,1);ones(100,1)];
m = svmtrain(Y,Y); % train with all labels as data (never do this in practices, not the "all" part, not the training on labels part ;)
[a,b,c] = svmpredict(Y,Y,m); % of course this will give 100% accuracy.
b' % you can see that the first label will always have an internal representation of 1.

For multi-class classification this is different: it then contains k(k-1)/2 entries, corresponding to all one-against-all class scenarios for each pixel. This means for eg. a 4 class problem you'd have 4*3/2 = 6 values for each sample:

[ f12(z) f13(z) f14(z) f23(z) f24(z) f34(z)]

Now how these function values map to classes through one-against-all I could not really deduce easily from looking at the code... But I guess you where mostly interested in the 2 class case anyhow, no?

Guib answered 14/6, 2012 at 12:50 Comment(7)

I'm interested in binary classification. My problem is that I'd like to kwnow the meaning of decision value.So if d<<1 means that z if certainly a member of first class and if d is close to 1 is an uncertain instance? And what you mean by first class -1 or +1? – Coheman 14/6, 2012 at 14:16

Edited post for more explanation. – Guib 14/6, 2012 at 14:45

And I fixed some errors where I mis understood the docs myself :s. – Guib 14/6, 2012 at 14:54

So the decision value is f(z) right? And what meaning could I give to a decision value? Is the distance of the example z from the hyperplane (decision boundary)? – Coheman 14/6, 2012 at 19:38

I'm not quite sure, you should either look through the docs, or dig in the source code, but I think that it's actually the distance to the hyperplane, so that if abs(f(z))==1 the sample lies on the margin. For more in depth information on svm's I'd suggest this book : an introduction to Support vector machines by John Shawe-Taylor & Nello Cristianini. – Guib 14/6, 2012 at 20:28

You cannot give any meaning to the decision value - it is a signed value only. If the kernel is linear it may be possible to determine the distance from the decision function however, w is not a normal vector, so in order to get the distance some modification would have to be done there too. IN non-linear cases you cannot determine the distance at all because its meaningless in lower dimensional space. – Antidisestablishmentarianism 14/6, 2012 at 20:42

But there should be a difference between a decision value of +2 or +7? Is exactly the same? Is there any way to interpret decision values as values of a membership function for the class? This is because the -b offer poor performances so I'd like to obtain a [0 1] value for each example z that give me information about how much the model is "sure" to label with +1 (or -1) class. – Coheman 14/6, 2012 at 22:6

Recommended topics

Hot tags