I have implemented an OpenCV/C++ wrapper for libsvm. When doing a grid-search for SVM parameters (RBF kernel), the prediction always returns the same label. I have created artificial data sets which have very easily separable data (and tried predicting data I just trained on) but still, it returns the same label.
I have used the MATLAB implementation of libsvm and achieved high accuracy on the same data set. I must be doing something wrong with setting up the problem but I've gone through the README many times and I can't quite find the issue.
Here is how I set up the libsvm problem, where data is an OpenCV Mat:
const int rowSize = data.rows;
const int colSize = data.cols;
this->_svmProblem = new svm_problem;
std::memset(this->_svmProblem,0,sizeof(svm_problem));
//dynamically allocate the X matrix...
this->_svmProblem->x = new svm_node*[rowSize];
for(int row = 0; row < rowSize; ++row)
this->_svmProblem->x[row] = new svm_node[colSize + 1];
//...and the y vector
this->_svmProblem->y = new double[rowSize];
this->_svmProblem->l = rowSize;
for(int row = 0; row < rowSize; ++row)
{
for(int col = 0; col < colSize; ++col)
{
//set the index and the value. indexing starts at 1.
this->_svmProblem->x[row][col].index = col + 1;
double tempVal = (double)data.at<float>(row,col);
this->_svmProblem->x[row][col].value = tempVal;
}
this->_svmProblem->x[row][colSize].index = -1;
this->_svmProblem->x[row][colSize].value = 0;
//add the label to the y array, and feature vector to X matrix
double tempVal = (double)labels.at<float>(row);
this->_svmProblem->y[row] = tempVal;
}
}/*createProblem()*/
Here is how I set up the parameters, where svmParams is my own struct for C/Gamma and such:
this->_svmParameter = new svm_parameter;
std::memset(this->_svmParameter,0,sizeof(svm_parameter));
this->_svmParameter->svm_type = svmParams.svmType;
this->_svmParameter->kernel_type = svmParams.kernalType;
this->_svmParameter->C = svmParams.C;
this->_svmParameter->gamma = svmParams.gamma;
this->_svmParameter->nr_weight = 0;
this->_svmParameter->eps = 0.001;
this->_svmParameter->degree = 1;
this->_svmParameter->shrinking = 0;
this->_svmParameter->probability = 1;
this->_svmParameter->cache_size = 100;
I use the provided param/problem checking function and no errors are returned.
I then train as such:
this->_svmModel = svm_train(this->_svmProblem, this->_svmParameter);
And then predict like so:
float pred = (float)svm_predict(this->_svmModel, x[i]);
If anyone could point out where I'm going wrong here I would greatly appreciate it. Thank you!
EDIT:
Using this code I printed the contents of the problem
for(int i = 0; i < rowSize; ++i)
{
std::cout << "[";
for(int j = 0; j < colSize + 1; ++j)
{
std::cout << " (" << this->_svmProblem->x[i][j].index << ", " << this->_svmProblem->x[i][j].value << ")";
}
std::cout << "]" << " <" << this->_svmProblem->y[i] << ">" << std::endl;
}
Here is the output:
[ (1, -1) (2, 0) (-1, 0)] <1>
[ (1, -0.92394) (2, 0) (-1, 0)] <1>
[ (1, -0.7532) (2, 0) (-1, 0)] <1>
[ (1, -0.75977) (2, 0) (-1, 0)] <1>
[ (1, -0.75337) (2, 0) (-1, 0)] <1>
[ (1, -0.76299) (2, 0) (-1, 0)] <1>
[ (1, -0.76527) (2, 0) (-1, 0)] <1>
[ (1, -0.74631) (2, 0) (-1, 0)] <1>
[ (1, -0.85153) (2, 0) (-1, 0)] <1>
[ (1, -0.72436) (2, 0) (-1, 0)] <1>
[ (1, -0.76485) (2, 0) (-1, 0)] <1>
[ (1, -0.72936) (2, 0) (-1, 0)] <1>
[ (1, -0.94004) (2, 0) (-1, 0)] <1>
[ (1, -0.92756) (2, 0) (-1, 0)] <1>
[ (1, -0.9688) (2, 0) (-1, 0)] <1>
[ (1, 0.05193) (2, 0) (-1, 0)] <3>
[ (1, -0.048488) (2, 0) (-1, 0)] <3>
[ (1, 0.070436) (2, 0) (-1, 0)] <3>
[ (1, 0.15191) (2, 0) (-1, 0)] <3>
[ (1, -0.07331) (2, 0) (-1, 0)] <3>
[ (1, 0.019786) (2, 0) (-1, 0)] <3>
[ (1, -0.072793) (2, 0) (-1, 0)] <3>
[ (1, 0.16157) (2, 0) (-1, 0)] <3>
[ (1, -0.057188) (2, 0) (-1, 0)] <3>
[ (1, -0.11187) (2, 0) (-1, 0)] <3>
[ (1, 0.15886) (2, 0) (-1, 0)] <3>
[ (1, -0.0701) (2, 0) (-1, 0)] <3>
[ (1, -0.17816) (2, 0) (-1, 0)] <3>
[ (1, 0.12305) (2, 0) (-1, 0)] <3>
[ (1, 0.058615) (2, 0) (-1, 0)] <3>
[ (1, 0.80203) (2, 0) (-1, 0)] <1>
[ (1, 0.734) (2, 0) (-1, 0)] <1>
[ (1, 0.9072) (2, 0) (-1, 0)] <1>
[ (1, 0.88061) (2, 0) (-1, 0)] <1>
[ (1, 0.83903) (2, 0) (-1, 0)] <1>
[ (1, 0.86604) (2, 0) (-1, 0)] <1>
[ (1, 1) (2, 0) (-1, 0)] <1>
[ (1, 0.77988) (2, 0) (-1, 0)] <1>
[ (1, 0.8578) (2, 0) (-1, 0)] <1>
[ (1, 0.79559) (2, 0) (-1, 0)] <1>
[ (1, 0.99545) (2, 0) (-1, 0)] <1>
[ (1, 0.78376) (2, 0) (-1, 0)] <1>
[ (1, 0.72177) (2, 0) (-1, 0)] <1>
[ (1, 0.72619) (2, 0) (-1, 0)] <1>
[ (1, 0.80149) (2, 0) (-1, 0)] <1>
[ (1, 0.092327) (2, -1) (-1, 0)] <2>
[ (1, 0.019054) (2, -1) (-1, 0)] <2>
[ (1, 0.15287) (2, -1) (-1, 0)] <2>
[ (1, -0.1471) (2, -1) (-1, 0)] <2>
[ (1, -0.068182) (2, -1) (-1, 0)] <2>
[ (1, -0.094567) (2, -1) (-1, 0)] <2>
[ (1, -0.17071) (2, -1) (-1, 0)] <2>
[ (1, -0.16646) (2, -1) (-1, 0)] <2>
[ (1, -0.030421) (2, -1) (-1, 0)] <2>
[ (1, 0.094346) (2, -1) (-1, 0)] <2>
[ (1, -0.14408) (2, -1) (-1, 0)] <2>
[ (1, 0.090025) (2, -1) (-1, 0)] <2>
[ (1, 0.043706) (2, -1) (-1, 0)] <2>
[ (1, 0.15065) (2, -1) (-1, 0)] <2>
[ (1, -0.11751) (2, -1) (-1, 0)] <2>
[ (1, -0.02324) (2, 1) (-1, 0)] <2>
[ (1, 0.0080356) (2, 1) (-1, 0)] <2>
[ (1, -0.17752) (2, 1) (-1, 0)] <2>
[ (1, 0.011135) (2, 1) (-1, 0)] <2>
[ (1, -0.029063) (2, 1) (-1, 0)] <2>
[ (1, 0.15398) (2, 1) (-1, 0)] <2>
[ (1, 0.097746) (2, 1) (-1, 0)] <2>
[ (1, 0.01018) (2, 1) (-1, 0)] <2>
[ (1, 0.015592) (2, 1) (-1, 0)] <2>
[ (1, -0.062793) (2, 1) (-1, 0)] <2>
[ (1, 0.014444) (2, 1) (-1, 0)] <2>
[ (1, -0.1205) (2, 1) (-1, 0)] <2>
[ (1, -0.18011) (2, 1) (-1, 0)] <2>
[ (1, 0.010521) (2, 1) (-1, 0)] <2>
[ (1, 0.036914) (2, 1) (-1, 0)] <2>
Here, the data is printed in the format [ (index, value)...] label.
The artificial dataset I created just has 3 classes, all which are easily separable with a non-linear decision boundary. Each row is a feature vector (observation), with 2 features (x coord, y coord). Libsvm asks to terminate each vector with a -1 label, so I do.
EDIT2:
This edit pertains to my C and Gamma values used for training, as well as data scaling. I normally data between 0 and 1 (as suggested here: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf). I will scale this fake dataset as well and try again, although I used this same exact dataset with the MATLAB implementation of libsvm and it could separate this unscaled data with 100% accuracy.
For C and Gamma, I also use the values recommended in the guide. I create two vectors and use a double nested loop to try all combinations:
std::vector<double> CList, GList;
double baseNum = 2.0;
for(double j = -5; j <= 15; j += 2) //-5 and 15
CList.push_back(pow(baseNum,j));
for(double j = -15; j <= 3; j += 2) //-15 and 3
GList.push_back(pow(baseNum,j));
And the loop looks like:
for(auto CIt = CList.begin(); CIt != CList.end(); ++CIt) //for all C's
{
double C = *CIt;
for(auto GIt = GList.begin(); GIt != GList.end(); ++GIt) //for all gamma's
{
double gamma = *GIt;
svmParams.svmType = C_SVC;
svmParams.kernalType = RBF;
svmParams.C = C;
svmParams.gamma = gamma;
......training code etc..........
EDIT3:
Since I keep referencing MATLAB, I will show the accuracy differences. Here is a heat map of the accuracy libsvm yields:
And here is the accuracy map MATLAB yields using the same parameters and same C/Gamma grid:
Here is the code used to generate the C/Gamma lists, and how I train:
CList = 2.^(-15:2:15);%(-5:2:15);
GList = 2.^(-15:2:15);%(-15:2:3);
cmd = ['-q -s 0 -t 2 -c ', num2str(C), ' -g ', num2str(gamma)];
model = ovrtrain(yTrain,xTrain,cmd);
EDIT4
As a sanity check, I reformatted my fake scaled dataset to conform to the dataset used by libsvm's Unix/Linux terminal API. I trained and predicted using a C/Gamma found in in the MATLAB accuracy map. The prediction accuracy was 100%. Thus I am absolutely doing something wrong in the C++ implementation.
EDIT5
I loaded the model trained from the Linux terminal into my C++ wrapper class. I then tried predicting the same exact dataset used for training. The accuracy in C++ was still awful! However, I'm very close to narrowing the source of the problem. If MATLAB/Linux both agree in terms of 100% accuracy, and the model it produces has already been proven to yield 100% accuracy on the same dataset that was trained on, and now my C++ wrapper class shows poor performance with the verified model... there are three possible situations:
- The method I use to transform cv::Mats into the svm_node* it requires for prediction has a problem in it.
- The method I use to predict labels has a problem in it.
- BOTH 2 and 3!
The code to really inspect now is how I create the svm_node. Here it is again:
svm_node** LibSVM::createNode(INPUT const cv::Mat& data)
{
const int rowSize = data.rows;
const int colSize = data.cols;
//dynamically allocate the X matrix...
svm_node** x = new svm_node*[rowSize];
if(x == NULL)
throw MLInterfaceException("Could not allocate SVM Node Array.");
for(int row = 0; row < rowSize; ++row)
{
x[row] = new svm_node[colSize + 1]; //+1 here for the index-terminating -1
if(x[row] == NULL)
throw MLInterfaceException("Could not allocate SVM Node.");
}
for(int row = 0; row < rowSize; ++row)
{
for(int col = 0; col < colSize; ++col)
{
double tempVal = data.at<double>(row,col);
x[row][col].value = tempVal;
}
x[row][colSize].index = -1;
x[row][colSize].value = 0;
}
return x;
} /*createNode()*/
And prediction:
cv::Mat LibSVM::predict(INPUT const cv::Mat& data)
{
if(this->_svmModel == NULL)
throw MLInterfaceException("Cannot predict; no model has been trained or loaded.");
cv::Mat predMat;
//create the libsvm representation of data
svm_node** x = this->createNode(data);
//perform prediction for each feature vector
for(int i = 0; i < data.rows; ++i)
{
double pred = svm_predict(this->_svmModel, x[i]);
predMat.push_back<double>(pred);
}
//delete all rows and columns of x
for(int i = 0; i < data.rows; ++i)
delete[] x[i];
delete[] x;
return predMat;
}
EDIT6:
For those of you tuning in at home, I trained a model (using optimal C/Gamma found in MATLAB) in C++, saved it to file, and then tried predicting on the training data via Linux terminal. It scored 100%. Something is wrong with my prediction. o_0
EDIT7:
I found the issue finally. I had tremendous bug-tracking help in finding it. I printed the contents of the svm_node** 2D array used for prediction. It was a subset of the createProblem() method. There was a piece of it that I failed to copy + paste over to the new function. It was the index of a given feature; it was never written. There should have been 1 more line:
x[row][col].index = col + 1; //indexing starts at 1
And the prediction works fine now.
this->_svmProblem->y
andthis->_svmProblem->x
to verify you are indeed giving it the training problem you think you are. – Liverpudlian