ValueError: Data is not binary and pos_label is not specified
Asked Answered
T

2

21

I am trying to calculate roc_auc_score, but I am getting following error.

"ValueError: Data is not binary and pos_label is not specified"

My code snippet is as follows:

import numpy as np
from sklearn.metrics import roc_auc_score
y_scores=np.array([ 0.63, 0.53, 0.36, 0.02, 0.70 ,1 , 0.48, 0.46, 0.57])
y_true=np.array(['0', '1', '0', '0', '1', '1', '1', '1', '1'])
roc_auc_score(y_true, y_scores)

Please tell me what is wrong with it.

Travertine answered 23/8, 2013 at 10:55 Comment(0)
P
19

You only need to change y_trueso it looks like this:

y_true=np.array([0, 1, 0, 0, 1, 1, 1, 1, 1])

Explanation: If you take a look to what roc_auc_score functions does in https://github.com/scikit-learn/scikit-learn/blob/0.15.X/sklearn/metrics/metrics.py you will see that y_true is evaluated as follows:

classes = np.unique(y_true)
if (pos_label is None and not (np.all(classes == [0, 1]) or
 np.all(classes == [-1, 1]) or
 np.all(classes == [0]) or
 np.all(classes == [-1]) or
 np.all(classes == [1]))):
    raise ValueError("Data is not binary and pos_label is not specified")

At the moment of the execution pos_label is None, but as long as your are defining y_true as an array of characters the np.all are always false and as all of them are negated then the if condition is trueand the exception is raised.

Powe answered 24/8, 2013 at 10:32 Comment(1)
Seems like the file was deleted a long time ago and its no longer working in current version, I have updated the link to an older versionPowe
S
0

We have problem in y_true=np.array(['0', '1', '0', '0', '1', '1', '1', '1', '1']) Convert values of y_true to Boolean

y_true= '1' <= y_true
print(y_true) # [False  True False False  True  True  True  True  True]
Sanitize answered 25/9, 2019 at 19:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.