Background
I am answering this because I need to work through the content, and a question like this is a great excuse. Thank you for the good opportunity.
I use data from the built-in fisher iris data:
http://archive.ics.uci.edu/ml/datasets/Iris
I also use code snippets from the Mathworks tutorial on the classification, and for plotroc
Problem Description
There is clearer boundary within the domain to classify "setosa" but there is overlap for "versicoloir" vs. "virginica". This is a two dimensional plot, and some of the other information has been discarded to produce it. The ambiguity in the classification boundaries is a useful thing in this case.
%load data
load fisheriris
%show raw data
figure(1); clf
gscatter(meas(:,1), meas(:,2), species,'rgb','osd');
xlabel('Sepal length');
ylabel('Sepal width');
axis equal
axis tight
title('Raw Data')
Analysis
Lets say that we want to determine the bounds for a linear classifier that defines "virginica" versus "non-virginica". We could look at "self vs. not-self" for other classes, but they would have their own
So now we make some linear discriminants and plot the ROC for them:
%load data
load fisheriris
load iris_dataset
irisInputs=meas(:,1:2)';
irisTargets=irisTargets(3,:);
ldaClass1 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'linear')';
ldaClass2 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'diaglinear')';
ldaClass3 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'quadratic')';
ldaClass4 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'diagquadratic')';
ldaClass5 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'mahalanobis')';
myinput=repmat(irisTargets,5,1);
myoutput=[ldaClass1;ldaClass2;ldaClass3;ldaClass4;ldaClass5];
whos
plotroc(myinput,myoutput)
The result is shown in the following, though it took deleting repeat copies of the diagonal:
You can note in the code that I stack "myinput" and "myoutput" and feed them as inputs into the "plotroc" function. You should take the results of your classifier as targets and actuals and you can get similar results. This compares the actual output of your classifier versus the ideal output of your target values. Those are the input to plotroc.
So this will give you "built-in" ROC, which is useful for quick work, but does not make you learn every step in detail.
Questions you can ask at this point include:
- which classifier is best? How do I determine what best is in this case?
- What is the convex hull of the classifiers? Is there some mixture of classifiers that is more informative than any pure method? Bagging perhaps?