How to get a classifier's confidence score for a prediction in sklearn?

I

3

36

I would like to get a confidence score of each of the predictions that it makes, showing on how sure the classifier is on its prediction that it is correct.

I want something like this:

How sure is the classifier on its prediction?

Class 1: 81% that this is class 1
Class 2: 10%
Class 3: 6%
Class 4: 3%

Samples of my code:

features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)

# Determine amount of time to train
t0 = time()
model = SVC()
#model = SVC(kernel='poly')
#model = GaussianNB()

model.fit(features_train, labels_train)

print 'training time: ', round(time()-t0, 3), 's'

# Determine amount of time to predict
t1 = time()
pred = model.predict(features_test)

print 'predicting time: ', round(time()-t1, 3), 's'

accuracy = accuracy_score(labels_test, pred)

print 'Confusion Matrix: '
print confusion_matrix(labels_test, pred)

# Accuracy in the 0.9333, 9.6667, 1.0 range
print accuracy



model.predict(sub_main)

# Determine amount of time to predict
t1 = time()
pred = model.predict(sub_main)

print 'predicting time: ', round(time()-t1, 3), 's'

print ''
print 'Prediction: '
print pred

I suspect that I would use the score() function, but I seem to keep implementing it correctly. I don't know if that's the right function or not, but how would one get the confidence percentage of a classifier's prediction?

Implicatory answered 30/6, 2015 at 4:30 Comment(2)

really helpful question. is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to? – Crooked 18/12, 2015 at 15:28

the probabilities correspond to classifier.classes_. But they are non-sense if the dataset is small :-( . Moreover, they are also not guaranteed to match up with classifier.predict() :'( . link to docs page – Softcover 23/6, 2017 at 16:43

J

33

Per the SVC documentation, it looks like you need to change how you construct the SVC:

model = SVC(probability=True)

and then use the predict_proba method:

class_probabilities = model.predict_proba(sub_main)

Johan answered 30/6, 2015 at 4:58 Comment(4)

Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean? – Implicatory 30/6, 2015 at 15:57

@Implicatory How did you interpreted the values – Cyanate 12/3, 2019 at 12:5

Is the probability same as confidence? While predict_proba returns the proability/likelihood of that observation belonging to that particular class. How can we find the confidence with which the likelihood is determined – Estevez 17/1, 2022 at 13:53

If you have time, can help with this related question. - stats.stackexchange.com/questions/560774/… – Estevez 17/1, 2022 at 13:55

H

16

For those estimators implementing predict_proba() method, like Justin Peel suggested, You can just use predict_proba() to produce probability on your prediction.

For those estimators which do not implement predict_proba() method, you can construct confidence interval by yourself using bootstrap concept (repeatedly calculate your point estimates in many sub-samples).

Let me know if you need any detailed examples to demonstrate either of these two cases.

Honourable answered 30/6, 2015 at 7:8 Comment(7)

Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean? – Implicatory 30/6, 2015 at 15:57

@Implicatory They are already in percentage form. :) The sum of each row should equal exactly to 1. The last element is actually 0.992 which means the algo predict it belongs to this class with prob 99.2%. Note e-03 is just scientific notation. – Honourable 30/6, 2015 at 16:0

Ah I see now, thank you! :) I would have accepted your answer, but since Justin Peel commented first with the example that worked for me, I decided to give it to him, sorry about that but thanks for the advice! – Implicatory 30/6, 2015 at 17:36

No problem at all. :) Glad that we both could help. – Honourable 30/6, 2015 at 17:37

is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to? – Crooked 18/12, 2015 at 15:28

@JianxunLi - Could you please elaborate on the second case where the predict_proba() method is not provided – Harrow 28/1, 2019 at 7:40

Is the probability same as confidence? While predict_proba returns the proability/likelihood of that observation belonging to that particular class. How can we find the confidence with which the likelihood is determined? – Estevez 17/1, 2022 at 13:53

S

0

using above code you will get 4 class names with predicted value for each sample. You can change no_of_class for as many as you need.

probas1 =model.predict_proba(sub_main)
no_of_class=4

top3_classes1 = np.argsort(-probas1, axis=1)[:, :no_of_class]

class_labels1 = rf.classes_[top3_classes1[i]] for i in range(len(top3_classes1))]

class_labels1

top_confidence1=[probas1[i][top3_classes1[i]] for i in range(len(top_classes1))]

for i in range(len(class_labels1)):

    for j in range(no_of_class):

        print(f"Sample {i}: {class_labels1[i][j]} :: {top_confidence1[i][j]}")

NOTE: you can simply also convert this into dataframe where you can add column of predicted class and in another column its predicted value

Smidgen answered 16/5, 2023 at 7:42 Comment(0)

Recommended topics

Hot tags