Retraining after Cross Validation with libsvm
Asked Answered
P

2

18

I know that Cross validation is used for selecting good parameters. After finding them, i need to re-train the whole data without the -v option.

But the problem i face is that after i train with -v option, i get the cross-validation accuracy( e.g 85%). There is no model and i can't see the values of C and gamma. In that case how do i retrain?

Btw i applying 10 fold cross validation. e.g

optimization finished, #iter = 138
nu = 0.612233
obj = -90.291046, rho = -0.367013
nSV = 165, nBSV = 128
Total nSV = 165
Cross Validation Accuracy = 98.1273%

Need some help on it..

To get the best C and gamma, i use this code that is available in the LIBSVM FAQ

bestcv = 0;
for log2c = -6:10,
  for log2g = -6:3,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(TrainLabel,TrainVec, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('(best c=%g, g=%g, rate=%g)\n',bestc, bestg, bestcv);
  end
end

Another question : Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracy similar?

Another question : Cross-validation basically improves the accuracy of the model by avoiding the overfitting. So, it needs to have a model in place before it can improve. Am i right? Besides that, if i have a different model, then the cross-validation accuracy will be different? Am i right?

One more question: In the cross-validation accuracy, what is the value of C and gamma then?

The graph is something like this enter image description here

Then the values of C are 2 and gamma = 0.0078125. But when i retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason? Thanks in advance...

Peridot answered 28/1, 2012 at 17:48 Comment(0)
F
32

The -v option here is really meant to be used as a way to avoid the overfitting problem (instead of using the whole data for training, perform an N-fold cross-validation training on N-1 folds and testing on the remaining fold, one at-a-time, then report the average accuracy). Thus it only returns the cross-validation accuracy (assuming you have a classification problem, otherwise mean-squared error for regression) as a scalar number instead of an actual SVM model.

If you want to perform model selection, you have to implement a grid search using cross-validation (similar to the grid.py helper python script), to find the best values of C and gamma.

This shouldn't be hard to implement: create a grid of values using MESHGRID, iterate overall all pairs (C,gamma) training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy...

Example:

%# read some training data
[labels,data] = libsvmread('./heart_scale');

%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);

%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
    cv_acc(i) = svmtrain(labels, data, ...
                    sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end

%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);

%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...
    'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')

%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...

contour_plot

Fdic answered 28/1, 2012 at 22:34 Comment(10)
awesome code, thanks...One more qn: The point where accuracy value is the location of best c and gamma. Am i right?Peridot
@lakesh: correct, just remember that the graph is drawn with a log2 scale (so the best values here are C=2^9 and gamma=2^-11)Fdic
Awesome... I edited my question above.. Basically i have added a few more minor questions...Like to know your ans for those questions.Peridot
@lakesh: I suggest you refer to a proper machine learning book and read up more on overfitting, training/testing/validation sets, bias/variance, etc... (these topics are not SVM-specific)Fdic
@Fdic [~,idx] = max(cv_acc); you said. is that -> [C,idx] = max(cv_acc); ?Servia
@kamaci: no, I simply get the index corresponding to the highest accuracy. To get the actual values use: C(idx) and gamma(idx)Fdic
@Fdic I am new to Matlab, I use 7.7.0 and it says it it a invalid syntax. Do you mean I should write anything else there because of I will not use that variable?Servia
@kamaci: ah, the ~ syntax was introduced in R2009b (I think), your version is older.. Use a dummy variable instead: [dummy,idx] = max(cv_acc); to ignore the first outputFdic
@Fdic why did you try 2^C(i), 2^gamma(i) instead of C(i), gamma(i)?Servia
@kamaci: its just the range of values I choose to search (the grid could have been created as [C,gamma] = meshgrid(2.^(-5:2:15), 2.^(-15:2:3))). I was trying to make it look like the figure on libsvm homepage)Fdic
W
2

If you use your entire dataset to determine your parameters, then train on that dataset, you are going to overfit your data. Ideally, you would divide the dataset, do the parameter search on a portion (with CV), then use the other portion to train and test with CV. Will you get better results if you use the whole dataset for both? Of course, but your model is likely to not generalize well. If you want determine true performance of your model, you need to do parameter selection separately.

Warble answered 27/2, 2012 at 8:22 Comment(11)
in the last statement, what do u mean by paramter selection? do u mean determine parameters on a certain portion.Peridot
I apologize for being unclear. Parameter Selection is the act of determining what parameters work best for your dataset (really what works best for the whole domain of the dataset and the future data you want to be able to classify.) My last statement was just meant to summarize what I said above - doing parameter selection separately means using a separate portion of the dataset to figure out the best parameters, then use those parameters when you train on the unused portion.Warble
One qn: Shld i divide the entire data set into 10% to do the grid search and train the model with that parameters on 70% and test it on the remaining 20%? do u think this is a good idea?Peridot
Actually i have a total of 384 samples of which 268 are training and 116 are testing. Intitally i used libsvm with random parameters to get a model based on training examples and tested on test data to find the accuracy. Next i carried out 10-fold cross validation on the entire set with the same random parameters to find accuracy. Next i carried out grid search on the entire set to find the best parameters and trained a model with the parameter and tested on the test data to get my accuracy. is it wrong or correct?Peridot
Let me take eaach question individually: On how to divide the data set (10%, etc) - there is no right answer to this question. The more data you have for parameter selection, the best your parameters are going to be, but then you are taking data away from training, so your final model will suffer. The reverse happens as well - less data for data selection means poor parameters and your model suffers. In some ways, where to divide is yet another unknown to determine.Warble
2nd Part: I think you need to look more deeply into the concept of over-fitting. This picture is a nice illustration. Your data will never be a perfect representation of the data universe. When you build your model, you can create one that has 100% accuracy. You don't want that - that's like the green line in the picture. It won't generalize well, meaning it will make more mistakes on future data. This is called overfitting. It means you model is too tightly trained to your training data.Warble
This is why we keep test data out separate - sacred. It's our best guess at what the rest of our data universe might look like and we don't want to use it to build our model because we won't know if our model is really any good on NEW data if we do. If you use this data to do a parameter search, then you don't know if your classifier will be accurate on future data or if you just found the perfect parameters for this specific set of data.Warble
Since you have a test and training set, I would cut the training set in half - keeping the class proportions the same in each, use the first half to do a parameter grid search (ideally with cv). Then use those parameters to train a model on the 2nd half and test on the test set.Warble
First of all thanks for the comments and help and advice you are providing.. Busy with school work, that is why not able to respond immediately.. I will read thru all your comments.. Definitely i need your help again.... Thanks once again..Peridot
I understood all those u said and understood the need as well. I have a question: Like i said, i split my data into 70 and 30. 70 for training and 30 for testing. Is it ok if the in 70 i split it into 35% to do parameter search and use the value of c and gamma to train a model based on 70% and test it on 30%?Peridot
one more: looking at amro's ans.. When i implement his method, there are values for best_c and best_gamma and when i use those values, i get worse than random test case... why is that so? any idea?Peridot

© 2022 - 2024 — McMap. All rights reserved.