I need a somehow descriptive example showing how to do a 10-fold SVM classification on a two class set of data. there is just one example in the MATLAB documentation but it is not with 10-fold. Can someone help me?
Example of 10-fold SVM classification in MATLAB
Asked Answered
Here's a complete example, using the following functions from the Bioinformatics Toolbox: SVMTRAIN, SVMCLASSIFY, CLASSPERF, CROSSVALIND.
load fisheriris %# load iris dataset
groups = ismember(species,'setosa'); %# create a two-class problem
%# number of cross-validation folds:
%# If you have 50 samples, divide them into 10 groups of 5 samples each,
%# then train with 9 groups (45 samples) and test with 1 group (5 samples).
%# This is repeated ten times, with each group used exactly once as a test set.
%# Finally the 10 results from the folds are averaged to produce a single
%# performance estimation.
k=10;
cvFolds = crossvalind('Kfold', groups, k); %# get indices of 10-fold CV
cp = classperf(groups); %# init performance tracker
for i = 1:k %# for each fold
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx; %# get indices training instances
%# train an SVM model over training instances
svmModel = svmtrain(meas(trainIdx,:), groups(trainIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
%# test using test instances
pred = svmclassify(svmModel, meas(testIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, testIdx);
end
%# get accuracy
cp.CorrectRate
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
with the output:
ans =
0.99333
ans =
100 1
0 49
0 0
we obtained 99.33%
accuracy with only one 'setosa' instance mis-classified as 'non-setosa'
UPDATE: SVM functions have moved to Statistics toolbox in R2013a
Thanks for the nice example. A little confusion that I have. Suppose I have 50 entries in all. The above code divides it into 10 sets of 5 entries each and then use 9 to train and 1 to test in each iteration. But the usual flow should be a little different maybe, i.e. 1. train 2. cross validate repeat the above and then test? or it doesn't make a difference? –
Hoashis
@user488652: I'm not clear on your question, but the code above follows the standard method of n-fold cross-validation –
Sosa
@Sosa Can you please explain
groups = ismember(species,'setosa');
Why have you used 'setosa' and not other two types of output. Also How can I use it for my dataset stored in a 25X5 matrix and results in 25X1 matrix with two outputs. –
Cattima @MaxSteel: SVM at its core is a binary classification algorithm, so you cant have more than two classes (I arbitrarily chose setosa vs. non-setosa classes). Fortunately there are methods to extend SVM to support multi-class cases. See here for an example: https://mcmap.net/q/540117/-support-vector-machines-in-matlab –
Sosa
Thank you for explanation Mr. Amro. Could you tell me how to plot a graph for
cp.CountingMatrix
? @Sosa –
Rallentando @TARIQ: a bit off topic, but you could simply use
bar3
to plot the confusion matrix. If you have the Neural Networks toolbox, there's the plotconfusion
function, otherwise you could manually do it like this: https://mcmap.net/q/541375/-coloring-a-matrix-in-matlab-duplicate –
Sosa I would also like to add that if you want to combine cv with PCA you would perform PCA for each fold (every training set) and do the same transform for the testing data. See [crossvalidated] (stats.stackexchange.com/questions/73032/…) –
Anabelanabella
Arno, that is from some time ago, but if you still see that I have a question: Why do you get the Correctrate with cp.CorrectRate only outside of the classification loop? As far as I can see this only gives you the accuracy of the last crossvalidation loop (and not a mean over all the crossvalidation)...? –
Guck
@Pegah: you should read the CLASSPERF doc page, my usage of the function is same as the example shown in the docs. First we initialize the
cp
object before the loop. Then inside the loop we update the cp
object with the predictions of the current validation fold. The function will accumulate results each time you call it. So when we finish the loop the results returned will be the average over the K folds. btw the name is Amro not Arno :) –
Sosa Sorry for the misspelling Amro (much nicer name btw)! Read it but still confused. E.g. when I have the following: CorrectRate(i)= cp.CorrectRate;inside the CV loop, the CorrectRate Array also updates obviously i-times, and its last instance (and not the mean of the array or so) will be equal to what a cp.CorrectRate outside the loop returns.. –
Guck
@Pegah:
cp.CorrectRate
returns the current running average (i.e rolling) of the classification precision and NOT the classification precision of the current fold. If you want the latter, use cp.LastCorrectRate
–
Sosa © 2022 - 2024 — McMap. All rights reserved.