Manually calculate AUC

T

3

8

How can I obtain the AUC value having fpr and tpr? Fpr and tpr are just 2 floats obtained from these formulas:

my_fpr = fp / (fp + tn)
my_tpr = tp / (tp + fn)
my_roc_auc = auc(my_fpr, my_tpr)

I know this can't pe possible, because fpr and tpr are just some floats and they need to be arrays, but I can't figure it out how to do that so. I also know that I can compute AUC this way:

y_predict_proba = model.predict_proba(X_test)
probabilities = np.array(y_predict_proba)[:, 1]
fpr, tpr, _ = roc_curve(y_test, probabilities)
roc_auc = auc(fpr, tpr)

but I want to avoid using predict_proba for some reasons. So my question is: how can I obtain AUC having fp, tp, fn, tn, fpr, tpr? In other words, is it possible to obtain AUC without roc_curve?

Tarragon answered 14/6, 2018 at 0:50 Comment(3)

Are you sure fpr and tpr really "just" to floats, or are they numpy arrays? – Ils 14/6, 2018 at 5:39

Yes, they are 2 floats values. – Tarragon 14/6, 2018 at 15:48

Then you cannot calculate a ROC curve. You need to get the values at all thresholds like roc_curve(y_test, probabilities) returns (whether it's a numpy array, pandas Series or just a list doesn't matter). – Ils 14/6, 2018 at 15:51

N

4

You can divide the space into 2 parts: a triangle and a trapezium. The triangle will have area TPR*FRP/2, the trapezium (1-FPR)*(1+TPR)/2 = 1/2 - FPR/2 + TPR/2 - TPR*FPR/2. The total area is 1/2 - FPR/2 + TPR/2. This is how you can get it, having just 2 points.

Noisome answered 17/6, 2018 at 0:30 Comment(1)

what about the total area of AUPRC? is this right? 1/2- recall/2+precision/2. – Bac 13/3 at 14:32

P

17

Yes, it is possible to obtain the AUC without calling roc_curve.

You first need to create the ROC (Receiver Operating Characteristics) curve. To be able to use the ROC curve, your classifier should be able to rank examples such that the ones with higher rank are more likely to be positive (e.g. fraudulent). As an example, Logistic Regression outputs probabilities, which is a score that you can use for ranking. The ROC curve is created by plotting the True Positive Pate (TPR) against the False Positive Rate (FPR) at various threshold settings. As an example:

The model performance is determined by looking at the area under the ROC curve (or AUC)

You can find here the more detailed explanation.

Packsaddle answered 28/5, 2019 at 5:58 Comment(0)

N

4

You can divide the space into 2 parts: a triangle and a trapezium. The triangle will have area TPR*FRP/2, the trapezium (1-FPR)*(1+TPR)/2 = 1/2 - FPR/2 + TPR/2 - TPR*FPR/2. The total area is 1/2 - FPR/2 + TPR/2. This is how you can get it, having just 2 points.

Noisome answered 17/6, 2018 at 0:30 Comment(1)

what about the total area of AUPRC? is this right? 1/2- recall/2+precision/2. – Bac 13/3 at 14:32

L

-1

from sklearn import metrics

my_fpr = fp / (fp + tn)
my_tpr = tp / (tp + fn)
my_roc_auc = metrics.auc([0, my_fpr, 1], [0, my_tpr, 1])

The key idea is to add two more points (FPR=0, TPR=0 and FPR=1, TPR=1) for AUC calculation. These two points always exist in the ROC curve.

Leff answered 28/2 at 12:45 Comment(0)

Recommended topics

Hot tags