Retraining after Cross Validation with libsvm

Asked 28/1, 2012 at 17:48 Answered 27/2, 2012 at 8:22

Solved matlab machine-learning classification svm libsvm

I know that Cross validation is used for selecting good parameters. After finding them, i need to re-train the whole data without the -v option.

But the problem i face is that after i train with -v option, i get the cross-validation accuracy( e.g 85%). There is no model and i can't see the values of C and gamma. In that case how do i retrain?

Btw i applying 10 fold cross validation. e.g

optimization finished, #iter = 138
nu = 0.612233
obj = -90.291046, rho = -0.367013
nSV = 165, nBSV = 128
Total nSV = 165
Cross Validation Accuracy = 98.1273%

Need some help on it..

To get the best C and gamma, i use this code that is available in the LIBSVM FAQ

bestcv = 0;
for log2c = -6:10,
  for log2g = -6:3,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(TrainLabel,TrainVec, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('(best c=%g, g=%g, rate=%g)\n',bestc, bestg, bestcv);
  end
end

Another question : Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracy similar?

Another question : Cross-validation basically improves the accuracy of the model by avoiding the overfitting. So, it needs to have a model in place before it can improve. Am i right? Besides that, if i have a different model, then the cross-validation accuracy will be different? Am i right?

One more question: In the cross-validation accuracy, what is the value of C and gamma then?

The graph is something like this enter image description here

Then the values of C are 2 and gamma = 0.0078125. But when i retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason? Thanks in advance...

Peridot answered 28/1, 2012 at 17:48 Comment(0)

The -v option here is really meant to be used as a way to avoid the overfitting problem (instead of using the whole data for training, perform an N-fold cross-validation training on N-1 folds and testing on the remaining fold, one at-a-time, then report the average accuracy). Thus it only returns the cross-validation accuracy (assuming you have a classification problem, otherwise mean-squared error for regression) as a scalar number instead of an actual SVM model.

If you want to perform model selection, you have to implement a grid search using cross-validation (similar to the grid.py helper python script), to find the best values of C and gamma.

This shouldn't be hard to implement: create a grid of values using MESHGRID, iterate overall all pairs (C,gamma) training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy...

Example:

%# read some training data
[labels,data] = libsvmread('./heart_scale');

%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);

%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
    cv_acc(i) = svmtrain(labels, data, ...
                    sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end

%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);

%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...
    'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')

%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...

contour_plot

Fdic answered 28/1, 2012 at 22:34 Comment(10)

awesome code, thanks...One more qn: The point where accuracy value is the location of best c and gamma. Am i right? – Peridot 29/1, 2012 at 8:49

@lakesh: correct, just remember that the graph is drawn with a log2 scale (so the best values here are C=2^9 and gamma=2^-11) – Fdic 29/1, 2012 at 22:6

Awesome... I edited my question above.. Basically i have added a few more minor questions...Like to know your ans for those questions. – Peridot 30/1, 2012 at 4:13

@lakesh: I suggest you refer to a proper machine learning book and read up more on overfitting, training/testing/validation sets, bias/variance, etc... (these topics are not SVM-specific) – Fdic 31/1, 2012 at 20:46

@Fdic [~,idx] = max(cv_acc); you said. is that -> [C,idx] = max(cv_acc); ? – Servia 4/1, 2013 at 22:10

@kamaci: no, I simply get the index corresponding to the highest accuracy. To get the actual values use: C(idx) and gamma(idx) – Fdic 4/1, 2013 at 22:40

@Fdic I am new to Matlab, I use 7.7.0 and it says it it a invalid syntax. Do you mean I should write anything else there because of I will not use that variable? – Servia 5/1, 2013 at 10:33

@kamaci: ah, the ~ syntax was introduced in R2009b (I think), your version is older.. Use a dummy variable instead: [dummy,idx] = max(cv_acc); to ignore the first output – Fdic 5/1, 2013 at 21:36

@Fdic why did you try 2^C(i), 2^gamma(i) instead of C(i), gamma(i)? – Servia 6/1, 2013 at 19:48

@kamaci: its just the range of values I choose to search (the grid could have been created as [C,gamma] = meshgrid(2.^(-5:2:15), 2.^(-15:2:3))). I was trying to make it look like the figure on libsvm homepage) – Fdic 6/1, 2013 at 23:39

If you use your entire dataset to determine your parameters, then train on that dataset, you are going to overfit your data. Ideally, you would divide the dataset, do the parameter search on a portion (with CV), then use the other portion to train and test with CV. Will you get better results if you use the whole dataset for both? Of course, but your model is likely to not generalize well. If you want determine true performance of your model, you need to do parameter selection separately.

Warble answered 27/2, 2012 at 8:22 Comment(11)

in the last statement, what do u mean by paramter selection? do u mean determine parameters on a certain portion. – Peridot 27/2, 2012 at 9:32

I apologize for being unclear. Parameter Selection is the act of determining what parameters work best for your dataset (really what works best for the whole domain of the dataset and the future data you want to be able to classify.) My last statement was just meant to summarize what I said above - doing parameter selection separately means using a separate portion of the dataset to figure out the best parameters, then use those parameters when you train on the unused portion. – Warble 27/2, 2012 at 20:29

One qn: Shld i divide the entire data set into 10% to do the grid search and train the model with that parameters on 70% and test it on the remaining 20%? do u think this is a good idea? – Peridot 28/2, 2012 at 7:43

Actually i have a total of 384 samples of which 268 are training and 116 are testing. Intitally i used libsvm with random parameters to get a model based on training examples and tested on test data to find the accuracy. Next i carried out 10-fold cross validation on the entire set with the same random parameters to find accuracy. Next i carried out grid search on the entire set to find the best parameters and trained a model with the parameter and tested on the test data to get my accuracy. is it wrong or correct? – Peridot 28/2, 2012 at 7:44

Let me take eaach question individually: On how to divide the data set (10%, etc) - there is no right answer to this question. The more data you have for parameter selection, the best your parameters are going to be, but then you are taking data away from training, so your final model will suffer. The reverse happens as well - less data for data selection means poor parameters and your model suffers. In some ways, where to divide is yet another unknown to determine. – Warble 28/2, 2012 at 14:35

2nd Part: I think you need to look more deeply into the concept of over-fitting. This picture is a nice illustration. Your data will never be a perfect representation of the data universe. When you build your model, you can create one that has 100% accuracy. You don't want that - that's like the green line in the picture. It won't generalize well, meaning it will make more mistakes on future data. This is called overfitting. It means you model is too tightly trained to your training data. – Warble 28/2, 2012 at 14:42

This is why we keep test data out separate - sacred. It's our best guess at what the rest of our data universe might look like and we don't want to use it to build our model because we won't know if our model is really any good on NEW data if we do. If you use this data to do a parameter search, then you don't know if your classifier will be accurate on future data or if you just found the perfect parameters for this specific set of data. – Warble 28/2, 2012 at 14:49

Since you have a test and training set, I would cut the training set in half - keeping the class proportions the same in each, use the first half to do a parameter grid search (ideally with cv). Then use those parameters to train a model on the 2nd half and test on the test set. – Warble 28/2, 2012 at 14:59

First of all thanks for the comments and help and advice you are providing.. Busy with school work, that is why not able to respond immediately.. I will read thru all your comments.. Definitely i need your help again.... Thanks once again.. – Peridot 28/2, 2012 at 22:5

I understood all those u said and understood the need as well. I have a question: Like i said, i split my data into 70 and 30. 70 for training and 30 for testing. Is it ok if the in 70 i split it into 35% to do parameter search and use the value of c and gamma to train a model based on 70% and test it on 30%? – Peridot 1/3, 2012 at 14:35

one more: looking at amro's ans.. When i implement his method, there are values for best_c and best_gamma and when i use those values, i get worse than random test case... why is that so? any idea? – Peridot 1/3, 2012 at 15:26

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags