libsvm (C++) Always Outputting Same Prediction
Asked Answered
B

1

9

I have implemented an OpenCV/C++ wrapper for libsvm. When doing a grid-search for SVM parameters (RBF kernel), the prediction always returns the same label. I have created artificial data sets which have very easily separable data (and tried predicting data I just trained on) but still, it returns the same label.

I have used the MATLAB implementation of libsvm and achieved high accuracy on the same data set. I must be doing something wrong with setting up the problem but I've gone through the README many times and I can't quite find the issue.

Here is how I set up the libsvm problem, where data is an OpenCV Mat:

    const int rowSize = data.rows;
    const int colSize = data.cols;

    this->_svmProblem = new svm_problem;
    std::memset(this->_svmProblem,0,sizeof(svm_problem));

    //dynamically allocate the X matrix...
    this->_svmProblem->x = new svm_node*[rowSize];
    for(int row = 0; row < rowSize; ++row)
        this->_svmProblem->x[row] = new svm_node[colSize + 1];

    //...and the y vector
    this->_svmProblem->y = new double[rowSize];
    this->_svmProblem->l = rowSize;

    for(int row = 0; row < rowSize; ++row)
    {
        for(int col = 0; col < colSize; ++col)
        {
            //set the index and the value. indexing starts at 1.
            this->_svmProblem->x[row][col].index = col + 1;
            double tempVal = (double)data.at<float>(row,col);
            this->_svmProblem->x[row][col].value = tempVal;
        }

        this->_svmProblem->x[row][colSize].index = -1;
        this->_svmProblem->x[row][colSize].value = 0;

        //add the label to the y array, and feature vector to X matrix
        double tempVal = (double)labels.at<float>(row);
        this->_svmProblem->y[row] = tempVal;
    }


}/*createProblem()*/

Here is how I set up the parameters, where svmParams is my own struct for C/Gamma and such:

    this->_svmParameter = new svm_parameter;
    std::memset(this->_svmParameter,0,sizeof(svm_parameter));
    this->_svmParameter->svm_type = svmParams.svmType;
    this->_svmParameter->kernel_type = svmParams.kernalType;
    this->_svmParameter->C = svmParams.C;
    this->_svmParameter->gamma = svmParams.gamma;
    this->_svmParameter->nr_weight = 0;
    this->_svmParameter->eps = 0.001;
    this->_svmParameter->degree = 1;
    this->_svmParameter->shrinking = 0;
    this->_svmParameter->probability = 1;
    this->_svmParameter->cache_size = 100;  

I use the provided param/problem checking function and no errors are returned.

I then train as such:

this->_svmModel = svm_train(this->_svmProblem, this->_svmParameter);

And then predict like so:

float pred = (float)svm_predict(this->_svmModel, x[i]);

If anyone could point out where I'm going wrong here I would greatly appreciate it. Thank you!

EDIT:

Using this code I printed the contents of the problem

for(int i = 0; i < rowSize; ++i)
    {
        std::cout << "[";
        for(int j = 0; j < colSize + 1; ++j)
        {
            std::cout << " (" << this->_svmProblem->x[i][j].index << ", " << this->_svmProblem->x[i][j].value << ")";
        }
        std::cout << "]" << " <" << this->_svmProblem->y[i] << ">" << std::endl;
    }

Here is the output:

[ (1, -1) (2, 0) (-1, 0)] <1>
[ (1, -0.92394) (2, 0) (-1, 0)] <1>
[ (1, -0.7532) (2, 0) (-1, 0)] <1>
[ (1, -0.75977) (2, 0) (-1, 0)] <1>
[ (1, -0.75337) (2, 0) (-1, 0)] <1>
[ (1, -0.76299) (2, 0) (-1, 0)] <1>
[ (1, -0.76527) (2, 0) (-1, 0)] <1>
[ (1, -0.74631) (2, 0) (-1, 0)] <1>
[ (1, -0.85153) (2, 0) (-1, 0)] <1>
[ (1, -0.72436) (2, 0) (-1, 0)] <1>
[ (1, -0.76485) (2, 0) (-1, 0)] <1>
[ (1, -0.72936) (2, 0) (-1, 0)] <1>
[ (1, -0.94004) (2, 0) (-1, 0)] <1>
[ (1, -0.92756) (2, 0) (-1, 0)] <1>
[ (1, -0.9688) (2, 0) (-1, 0)] <1>
[ (1, 0.05193) (2, 0) (-1, 0)] <3>
[ (1, -0.048488) (2, 0) (-1, 0)] <3>
[ (1, 0.070436) (2, 0) (-1, 0)] <3>
[ (1, 0.15191) (2, 0) (-1, 0)] <3>
[ (1, -0.07331) (2, 0) (-1, 0)] <3>
[ (1, 0.019786) (2, 0) (-1, 0)] <3>
[ (1, -0.072793) (2, 0) (-1, 0)] <3>
[ (1, 0.16157) (2, 0) (-1, 0)] <3>
[ (1, -0.057188) (2, 0) (-1, 0)] <3>
[ (1, -0.11187) (2, 0) (-1, 0)] <3>
[ (1, 0.15886) (2, 0) (-1, 0)] <3>
[ (1, -0.0701) (2, 0) (-1, 0)] <3>
[ (1, -0.17816) (2, 0) (-1, 0)] <3>
[ (1, 0.12305) (2, 0) (-1, 0)] <3>
[ (1, 0.058615) (2, 0) (-1, 0)] <3>
[ (1, 0.80203) (2, 0) (-1, 0)] <1>
[ (1, 0.734) (2, 0) (-1, 0)] <1>
[ (1, 0.9072) (2, 0) (-1, 0)] <1>
[ (1, 0.88061) (2, 0) (-1, 0)] <1>
[ (1, 0.83903) (2, 0) (-1, 0)] <1>
[ (1, 0.86604) (2, 0) (-1, 0)] <1>
[ (1, 1) (2, 0) (-1, 0)] <1>
[ (1, 0.77988) (2, 0) (-1, 0)] <1>
[ (1, 0.8578) (2, 0) (-1, 0)] <1>
[ (1, 0.79559) (2, 0) (-1, 0)] <1>
[ (1, 0.99545) (2, 0) (-1, 0)] <1>
[ (1, 0.78376) (2, 0) (-1, 0)] <1>
[ (1, 0.72177) (2, 0) (-1, 0)] <1>
[ (1, 0.72619) (2, 0) (-1, 0)] <1>
[ (1, 0.80149) (2, 0) (-1, 0)] <1>
[ (1, 0.092327) (2, -1) (-1, 0)] <2>
[ (1, 0.019054) (2, -1) (-1, 0)] <2>
[ (1, 0.15287) (2, -1) (-1, 0)] <2>
[ (1, -0.1471) (2, -1) (-1, 0)] <2>
[ (1, -0.068182) (2, -1) (-1, 0)] <2>
[ (1, -0.094567) (2, -1) (-1, 0)] <2>
[ (1, -0.17071) (2, -1) (-1, 0)] <2>
[ (1, -0.16646) (2, -1) (-1, 0)] <2>
[ (1, -0.030421) (2, -1) (-1, 0)] <2>
[ (1, 0.094346) (2, -1) (-1, 0)] <2>
[ (1, -0.14408) (2, -1) (-1, 0)] <2>
[ (1, 0.090025) (2, -1) (-1, 0)] <2>
[ (1, 0.043706) (2, -1) (-1, 0)] <2>
[ (1, 0.15065) (2, -1) (-1, 0)] <2>
[ (1, -0.11751) (2, -1) (-1, 0)] <2>
[ (1, -0.02324) (2, 1) (-1, 0)] <2>
[ (1, 0.0080356) (2, 1) (-1, 0)] <2>
[ (1, -0.17752) (2, 1) (-1, 0)] <2>
[ (1, 0.011135) (2, 1) (-1, 0)] <2>
[ (1, -0.029063) (2, 1) (-1, 0)] <2>
[ (1, 0.15398) (2, 1) (-1, 0)] <2>
[ (1, 0.097746) (2, 1) (-1, 0)] <2>
[ (1, 0.01018) (2, 1) (-1, 0)] <2>
[ (1, 0.015592) (2, 1) (-1, 0)] <2>
[ (1, -0.062793) (2, 1) (-1, 0)] <2>
[ (1, 0.014444) (2, 1) (-1, 0)] <2>
[ (1, -0.1205) (2, 1) (-1, 0)] <2>
[ (1, -0.18011) (2, 1) (-1, 0)] <2>
[ (1, 0.010521) (2, 1) (-1, 0)] <2>
[ (1, 0.036914) (2, 1) (-1, 0)] <2>

Here, the data is printed in the format [ (index, value)...] label.

The artificial dataset I created just has 3 classes, all which are easily separable with a non-linear decision boundary. Each row is a feature vector (observation), with 2 features (x coord, y coord). Libsvm asks to terminate each vector with a -1 label, so I do.

EDIT2:

This edit pertains to my C and Gamma values used for training, as well as data scaling. I normally data between 0 and 1 (as suggested here: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf). I will scale this fake dataset as well and try again, although I used this same exact dataset with the MATLAB implementation of libsvm and it could separate this unscaled data with 100% accuracy.

For C and Gamma, I also use the values recommended in the guide. I create two vectors and use a double nested loop to try all combinations:

std::vector<double> CList, GList;
double baseNum = 2.0;
for(double j = -5; j <= 15; j += 2) //-5 and 15
    CList.push_back(pow(baseNum,j));
for(double j = -15; j <= 3; j += 2) //-15 and 3
    GList.push_back(pow(baseNum,j));

And the loop looks like:

    for(auto CIt = CList.begin(); CIt != CList.end(); ++CIt) //for all C's
    {
        double C = *CIt;
        for(auto GIt = GList.begin(); GIt != GList.end(); ++GIt) //for all gamma's
        {
            double gamma = *GIt;
            svmParams.svmType = C_SVC;
            svmParams.kernalType = RBF;
            svmParams.C = C;
            svmParams.gamma = gamma;

        ......training code etc..........

EDIT3:

Since I keep referencing MATLAB, I will show the accuracy differences. Here is a heat map of the accuracy libsvm yields:

libsvm c++

And here is the accuracy map MATLAB yields using the same parameters and same C/Gamma grid:

libsvm MATLAB

Here is the code used to generate the C/Gamma lists, and how I train:

CList = 2.^(-15:2:15);%(-5:2:15);
GList = 2.^(-15:2:15);%(-15:2:3);
cmd = ['-q -s 0 -t 2 -c ', num2str(C), ' -g ', num2str(gamma)];
model = ovrtrain(yTrain,xTrain,cmd);

EDIT4

As a sanity check, I reformatted my fake scaled dataset to conform to the dataset used by libsvm's Unix/Linux terminal API. I trained and predicted using a C/Gamma found in in the MATLAB accuracy map. The prediction accuracy was 100%. Thus I am absolutely doing something wrong in the C++ implementation.

EDIT5

I loaded the model trained from the Linux terminal into my C++ wrapper class. I then tried predicting the same exact dataset used for training. The accuracy in C++ was still awful! However, I'm very close to narrowing the source of the problem. If MATLAB/Linux both agree in terms of 100% accuracy, and the model it produces has already been proven to yield 100% accuracy on the same dataset that was trained on, and now my C++ wrapper class shows poor performance with the verified model... there are three possible situations:

  1. The method I use to transform cv::Mats into the svm_node* it requires for prediction has a problem in it.
  2. The method I use to predict labels has a problem in it.
  3. BOTH 2 and 3!

The code to really inspect now is how I create the svm_node. Here it is again:

svm_node** LibSVM::createNode(INPUT const cv::Mat& data)
{
    const int rowSize = data.rows;
    const int colSize = data.cols;

    //dynamically allocate the X matrix...
    svm_node** x = new svm_node*[rowSize];
    if(x == NULL)
        throw MLInterfaceException("Could not allocate SVM Node Array.");

    for(int row = 0; row < rowSize; ++row)
    {
        x[row] = new svm_node[colSize + 1]; //+1 here for the index-terminating -1
        if(x[row] == NULL)
            throw MLInterfaceException("Could not allocate SVM Node.");
    }

    for(int row = 0; row < rowSize; ++row)
    {
        for(int col = 0; col < colSize; ++col)
        {
            double tempVal = data.at<double>(row,col);
            x[row][col].value = tempVal;
        }

        x[row][colSize].index = -1;
        x[row][colSize].value = 0;
    }

    return x;
} /*createNode()*/

And prediction:

cv::Mat LibSVM::predict(INPUT const cv::Mat& data)
{
    if(this->_svmModel == NULL)
        throw MLInterfaceException("Cannot predict; no model has been trained or loaded.");

    cv::Mat predMat;

    //create the libsvm representation of data
    svm_node** x = this->createNode(data);

    //perform prediction for each feature vector
    for(int i = 0; i < data.rows; ++i)
    {
        double pred = svm_predict(this->_svmModel, x[i]);
        predMat.push_back<double>(pred);
    }        

    //delete all rows and columns of x
    for(int i = 0; i < data.rows; ++i)
        delete[] x[i];
    delete[] x;


    return predMat;
}

EDIT6:

For those of you tuning in at home, I trained a model (using optimal C/Gamma found in MATLAB) in C++, saved it to file, and then tried predicting on the training data via Linux terminal. It scored 100%. Something is wrong with my prediction. o_0

EDIT7:

I found the issue finally. I had tremendous bug-tracking help in finding it. I printed the contents of the svm_node** 2D array used for prediction. It was a subset of the createProblem() method. There was a piece of it that I failed to copy + paste over to the new function. It was the index of a given feature; it was never written. There should have been 1 more line:

x[row][col].index = col + 1; //indexing starts at 1

And the prediction works fine now.

Bertsche answered 15/9, 2013 at 19:37 Comment(2)
Are you feeding libsvm the right labels? Try printing this->_svmProblem->y and this->_svmProblem->x to verify you are indeed giving it the training problem you think you are.Liverpudlian
Hello, thank you for your comment. I will post the code I used to display the problem data, as well as the output.Bertsche
M
3

It would be useful to see your gamma value, since your data is not normalized that would make a huge difference.

The gamma in libsvm is inversely to the hypersphere radius, so if those spheres are too small with respect to the input range, everything will be activated always and then the model would output always the same value.

So, the two recommendations would be 1) Scale your input values to the range [-1,1]. 2) Play with the gamma values.

Marcmarcano answered 16/9, 2013 at 14:3 Comment(24)
Hi, I normally scale my data between 0 and 1 as they recommend here: csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. I will update my post to show what I use for C and Gamma.Bertsche
@trianta2 Doing a grid search is good, but if your search range is off you won't find a suitable value. Have you tried to increase the range? Since the problem is likely to be because of big spheres you might try searching on bigger values of gamma, maybe between -10,10 ?Marcmarcano
I tried using a bigger grid search, however the max accuracy I achieve is only 60%. I used the libsvm implementation in MATLAB and it achieves 100% accuracy on this dataset, even without scaling. I will update my post to show my findings.Bertsche
@trianta2 Did you use the same range when doing using the Matlab wrapper? Do you know which C and gamma values are used on the Matlab experiment?Marcmarcano
I used the same range but MATLAB yielded much higher accuracies. I can look up which C and Gamma values give me the best accuracy via the heat map (updated in my original post).Bertsche
@trianta2 Hmm... given those 2 heatmaps something seems to be really off. Would you try scaling manually and try with that data in both places? The only thing I can think of is that Matlab is scaling the data differently and hence the difference. In general terms, Matlab should be just a wrapper for libsvm so this difference should not exists given the same input.Marcmarcano
Could you elaborate on what you mean by "scaling manually"? What I'm already doing is scaling each feature column from -1 to 1, storing the data in a file (.csv), and then having the C++/MATLAB implementations read from the .csv.Bertsche
Sorry, I had the impression that you were scaling the inputs in your program, disregard then my last comment.Marcmarcano
Can you try to use svm_predict with that data but with a set of values that work well on Matlab? I am very curious about this because the behaviour that you are presenting is very odd.Marcmarcano
The artificial dataset works great (100%) in MATLAB, but when used in my C++ wrapper class, yields 60% accuracy. The accuracy map above is for the artificial dataset. I used the Unix terminal API to train and predict on the artificial dataset (using a C and Gamma found in MATLAB), and it can predict the data with 100% accuracy.Bertsche
@trianta2 OK, if the command line output is consistent with matlab it means that there must a little bug in your code (since the command line is the same code that the library) I will try to compile your code and see if you have any problem.Marcmarcano
Isn't this line reducing the accuracy of your input? " double tempVal = (double)data.at<float>(row,col);"Marcmarcano
From a precision perspective, I suppose. The data I work with is floating point type, but libsvm requires double type. I must cast it with the .at<type> template method, with type being the type of data in the matrix (OpenCV mats can store many different types of data). Would the difference between float and double really matter? The artificial dataset I'm working with doesn't have that many significant figures, you could easily round it off to the 2nd decimal place.Bertsche
Also, I have a new "breakthrough". I will describe it in my original post.Bertsche
Well, sometimes it actually makes a significant difference if you are using float or double. In any case, the easier way to check if your problem is on the prediction code or in the training code is to save the model (svm_save_model) from your code and load it after using the command line. If after that both have poor performance then your problem is on the training, otherwise your problem is on the training side.Marcmarcano
That is a great idea. I will do just that. Also, I converted the .csv data to type double in my C++ code, and removed the casting from float. There was no improvement.Bertsche
I meant to say "otherwise your problem is on the testing side" but I am glad you got the idea :)Marcmarcano
I trained using the artificial data (using the C and Gamma found via MATLAB heat map), saved the model, and then ran the terminal prediction on the same artificial data. It scored 100%. The problem is in prediction. But I still can't find where...Bertsche
Cool... we are getting closer! :) Can you share the code on "this->createNode(data);" I guess that would be the first place where I would look for a bug (assuming that you loaded the properly)Marcmarcano
I posted that already, but not the method signature foolishly. Check EDIT5 above in a second.Bertsche
OK... odd... that code looks fine to me. You using the model resulted from training right? or you are saving in on a file and then loading it?Marcmarcano
There is a subtle missing piece of code that has a huge impact. It is in the prediction method. I will post it in my final edit.Bertsche
You were a tremendous help. Thank you very much.Bertsche
OH!!!! How could I forget that! I had the same issue the first time I worked with libsvm (~12 years ago). Good catch! I am glad you found the issue :)Marcmarcano

© 2022 - 2024 — McMap. All rights reserved.