scikit learn svc coef0 parameter range
Asked Answered
T

1

8

Documentation here.

I'm wondering how important the coef0 parameter is for SVCs under the polynomial and sigmoid kernels. As I understand it, it is the intercept term, just a constant as in linear regression to offset the function from zero. However to my knowledge, the SVM (scikit uses libsvm) should find this value.

What's a good general range to test over (is there one?). For example, generally with C, a safe choice is 10^-5 ... 10^5, going up in exponential steps.

But for coef0, the value seems highly data dependent and I'm not sure how to automate choosing good ranges for each grid search on each dataset. Any pointers?

Tache answered 27/1, 2014 at 20:6 Comment(0)
N
4

First, sigmoid function is rarely the kernel. In fact, for almost none values of parameters it is known to induce the valid kernel (in the Mercer's sense).

Second, coef0 is not an intercept term, it is a parameter of the kernel projection, which can be used to overcome one of the important issues with the polynomial kernel. In general, just using coef0=0 should be just fine, but polynomial kernel has one issue, with p->inf, it more and more separates pairs of points, for which <x,y> is smaller than 1 and <a,b> with bigger value. it is because powers of values smaller than one gets closer and closer to 0, while the same power of value bigger than one grows to infinity. You can use coef0 to "scale" your data so there is no such distinction - you can add 1-min <x,y>, so no values are smaller than 1 . If you really feel the need for tuning this parameter, I would suggest search in the range of [min(1-min , 0),max(<x,y>)], where max is computed through all the training set.

Naturally answered 27/1, 2014 at 22:21 Comment(5)
I don't fully understand your first sentence - are you saying for sigmoid kernel that this is a useless parameter to optimize over? And from the rest of your comment, would you say it is fair to just use 1 always for polynomial kernels to avoid values < 1?Tache
First statement states, that sigmoid is not a kernel for most of the parameters. It is not about the tuning, it is generally wrong function to use with kernel machines. It was introduced to make neural network community more familiar with svms, but that was not a good idea. For poly, I would say it is save to check coef0 values 0 and 1. As both can have some good properties, but I would leave checking other values.Naturally
I disagree with the general argument that sigmoid is the wrong kernel of choice. Everything starts from data. If your data maps well with sigmoid, then sigmoid is your choice. I have had cases where sigmoid proved to be the right function to model my data.Maynor
@Milad There is nothing to disagree here, sigmoid is not a kernel from mathematical perspective, this is a fact, not an opinion. Of course things can work from time to time even if they are incorrect, but it does not make them valid.Naturally
It is correct that sigmoid does not fit the definition of a kernel function. That is, sigmoid is not a positive definite function.Maynor

© 2022 - 2024 — McMap. All rights reserved.