sklearn: AUC score for LinearSVC and OneSVM
Asked Answered
C

2

6

One option of the SVM classifier (SVC) is probability which is false by default. The documentation does not say what it does. Looking at libsvm source code, it seems to do some sort of cross-validation.

This option does not exist for LinearSVC nor OneSVM.

I need to calculate AUC scores for several SVM models, including these last two. Should I calculate the AUC score using decision_function(X) as the thresholds?

Cannery answered 5/1, 2016 at 20:55 Comment(0)
C
8

Answering my own question.

Firstly, it is a common "myth" that you need probabilities to draw the ROC curve. No, you need some kind of threshold in your model that you can change. The ROC curve is then drawn by changing this threshold. The point of the ROC curve being, of course, to see how well your model is reproducing the hypothesis by seeing how well it is ordering the observations.

In the case of SVM, there are two ways I see people drawing ROC curves for them:

  1. using distance to the decision bondary, as I mentioned in my own question
  2. using the bias term as your threshold in the SVM: http://researchgate.net/post/How_can_I_plot_determine_ROC_AUC_for_SVM. In fact, if you use SVC(probabilities=True) then probabilities will be calculated for you in this manner, by using CV, which you can then use to draw the ROC curve. But as mentioned in the link I provide, it is much faster if you draw the ROC curve directly by varying the bias.

I think #2 is the same as #1 if we are using a linear kernel, as in my own case, because varying the bias is varying the distance in this particular case.

Cannery answered 17/2, 2016 at 11:55 Comment(1)
Do you have code that does this? I've been at this all day failing miserably.Maggio
B
2

In order to calculate AUC, using sklearn, you need a predict_proba method on your classifier; this is what the probability parameter on SVC does (you are correct that it's calculated using cross-validation). From the docs:

probability : boolean, optional (default=False)

Whether to enable probability estimates. This must be enabled prior to calling fit, and will slow down that method.

You can't use the decision function directly to compute AUC, since it's not a probability. I suppose you could scale the decision function to take values in the range [0,1], and compute AUC, however I'm not sure what statistical properties this will have; you certainly won't be able to use it to compare with ROC calculated using probabilities.

Bountiful answered 5/1, 2016 at 21:20 Comment(2)
That is not accurate. You need thresholds, these thresholds need not be probabilities. Usually you use probabilities, but they can be scores like when evaluating the ROC for a ranking classifier.Cannery
Anyhow I see my question is answered here: researchgate.net/post/How_can_I_plot_determine_ROC_AUC_for_SVMCannery

© 2022 - 2024 — McMap. All rights reserved.