scikit-learn roc_curve: why does it return a threshold value = 2 some time?

Asked 21/4, 2014 at 15:35 Answered 12/4, 2015 at 7:41

Correct me if I'm wrong: the "thresholds" returned by scikit-learn's roc_curve should be an array of numbers that are in [0,1]. However, it sometimes gives me an array with the first number close to "2". Is it a bug or I did sth wrong? Thanks.

In [1]: import numpy as np

In [2]: from sklearn.metrics import roc_curve

In [3]: np.random.seed(11)

In [4]: aa = np.random.choice([True, False],100)

In [5]: bb = np.random.uniform(0,1,100)

In [6]: fpr,tpr,thresholds = roc_curve(aa,bb)

In [7]: thresholds
Out[7]: 
array([ 1.97396826,  0.97396826,  0.9711752 ,  0.95996265,  0.95744405,
    0.94983331,  0.93290463,  0.93241372,  0.93214862,  0.93076592,
    0.92960511,  0.92245024,  0.91179548,  0.91112166,  0.87529458,
    0.84493853,  0.84068543,  0.83303741,  0.82565223,  0.81096657,
    0.80656679,  0.79387241,  0.77054807,  0.76763223,  0.7644911 ,
    0.75964947,  0.73995152,  0.73825262,  0.73466772,  0.73421299,
    0.73282534,  0.72391126,  0.71296292,  0.70930102,  0.70116428,
    0.69606617,  0.65869235,  0.65670881,  0.65261474,  0.6487222 ,
    0.64805644,  0.64221486,  0.62699782,  0.62522484,  0.62283401,
    0.61601839,  0.611632  ,  0.59548669,  0.57555854,  0.56828967,
    0.55652111,  0.55063947,  0.53885029,  0.53369398,  0.52157349,
    0.51900774,  0.50547317,  0.49749635,  0.493913  ,  0.46154029,
    0.45275916,  0.44777116,  0.43822067,  0.43795921,  0.43624093,
    0.42039077,  0.41866343,  0.41550367,  0.40032843,  0.36761763,
    0.36642721,  0.36567017,  0.36148354,  0.35843793,  0.34371331,
    0.33436415,  0.33408289,  0.33387442,  0.31887024,  0.31818719,
    0.31367915,  0.30216469,  0.30097917,  0.29995201,  0.28604467,
    0.26930354,  0.2383461 ,  0.22803687,  0.21800338,  0.19301808,
    0.16902881,  0.1688173 ,  0.14491946,  0.13648451,  0.12704826,
    0.09141459,  0.08569481,  0.07500199,  0.06288762,  0.02073298,
    0.01934336])

Tolerance answered 21/4, 2014 at 15:35 Comment(0)

Most of the time these thresholds are not used, for example in calculating the area under the curve, or plotting the False Positive Rate against the True Positive Rate.

Yet to plot what looks like a reasonable curve, one needs to have a threshold that incorporates 0 data points. Since Scikit-Learn's ROC curve function need not have normalised probabilities for thresholds (any score is fine), setting this point's threshold to 1 isn't sufficient; setting it to inf is sensible but coders often expect finite data (and it's possible the implementation also works for integer thresholds). Instead the implementation uses max(score) + epsilon where epsilon = 1. This may be cosmetically deficient, but you haven't given any reason why it's a problem!

Ardyth answered 22/4, 2014 at 9:55 Comment(3)

Hi, I need to plot thresholds(x-axis) against (tpr-fpr)(y-axis) to see how different thresholds affect (tpr-fpr). I understand this is not as common as plotting roc-curve or computing AUC, but I still find it useful for me. Thanks for clarifying what scikit-learn actually does. Good to know I didn't do anything wrong, and I can easily manipulate the array to give me the real thresholds I need for my plot. I do think better comments on this will benefits other users like me who don't want to dig into the source code. Thanks – Tolerance 22/4, 2014 at 22:21

There is now a comment in the function documentation. – Ardyth 24/4, 2014 at 0:19

"cosmetically deficient" love it – Tryst 16/4, 2021 at 21:18

From the documentation:

thresholds : array, shape = [n_thresholds] Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.

So the first element of thresholds is close to 2 because it is max(y_score) + 1, in your case thresholds[1] + 1.

Auramine answered 12/4, 2015 at 7:41 Comment(0)

this seems like a bug to me - in roc_curve(aa,bb), 1 is added to the first threshold. You should create an issue here https://github.com/scikit-learn/scikit-learn/issues

Bollix answered 21/4, 2014 at 16:12 Comment(0)

Recommended topics

Hot tags