scikit-learn roc_auc_score() returns accuracy values
Asked Answered
P

1

8

I am trying to compute area under the ROC curve using sklearn.metrics.roc_auc_score using the following method:

roc_auc = sklearn.metrics.roc_auc_score(actual, predicted)

where actual is a binary vector with ground truth classification labels and predicted is a binary vector with classification labels that my classifier has predicted.

However, the value of roc_auc that I am getting is EXACTLY similar to accuracy values (proportion of samples whose labels are correctly predicted). This is not a one-off thing. I try my classifier on various values of the parameters and every time I get the same result.

What am I doing wrong here?

Pennebaker answered 11/3, 2014 at 7:21 Comment(0)
H
15

This is because you are passing in the decisions of you classifier instead of the scores it calculated. There was a question on this on SO recently and a related pull request to scikit-learn.

The point of a ROC curve (and the area under it) is that you study the precision-recall tradeoff as the classification threshold is varied. By default in a binary classification task, if your classifier's score is > 0.5, then class1 is predicted, otherwise class0 is predicted. As you change that threshold, you get a curve like this. The higher up the curve is (more area under it), the better that classifier. However, to get this curve you need access to the scores of a classifier, not its decisions. Otherwise whatever the decision threshold is, the decision stay the same, and AUC degenerates to accuracy.

Which classifier are you using?

Hephzipah answered 11/3, 2014 at 10:36 Comment(3)
I am not using any built-in classifier. It is rather a heuristic that is applicable to my particular experiment and it does not give any confidence values, only the classification label. Do you have any suggestion here?Pennebaker
Also, the problem only seems to occur when I am passing balanced data (same number of +ve and -ve examples) to roc_auc_score(). If I pass unbalanced data (but the binary vectors), the results of accuracy and AUC are different.Pennebaker
You can meaningfully calculate AUC if you don't have confidence values, AFAIK. There are other performance metrics though.Hephzipah

© 2022 - 2024 — McMap. All rights reserved.