How to get a classifier's confidence score for a prediction in sklearn?
Asked Answered
I

3

36

I would like to get a confidence score of each of the predictions that it makes, showing on how sure the classifier is on its prediction that it is correct.

I want something like this:

How sure is the classifier on its prediction?

Class 1: 81% that this is class 1
Class 2: 10%
Class 3: 6%
Class 4: 3%

Samples of my code:

features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)

# Determine amount of time to train
t0 = time()
model = SVC()
#model = SVC(kernel='poly')
#model = GaussianNB()

model.fit(features_train, labels_train)

print 'training time: ', round(time()-t0, 3), 's'

# Determine amount of time to predict
t1 = time()
pred = model.predict(features_test)

print 'predicting time: ', round(time()-t1, 3), 's'

accuracy = accuracy_score(labels_test, pred)

print 'Confusion Matrix: '
print confusion_matrix(labels_test, pred)

# Accuracy in the 0.9333, 9.6667, 1.0 range
print accuracy



model.predict(sub_main)

# Determine amount of time to predict
t1 = time()
pred = model.predict(sub_main)

print 'predicting time: ', round(time()-t1, 3), 's'

print ''
print 'Prediction: '
print pred

I suspect that I would use the score() function, but I seem to keep implementing it correctly. I don't know if that's the right function or not, but how would one get the confidence percentage of a classifier's prediction?

Implicatory answered 30/6, 2015 at 4:30 Comment(2)
really helpful question. is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to?Crooked
the probabilities correspond to classifier.classes_. But they are non-sense if the dataset is small :-( . Moreover, they are also not guaranteed to match up with classifier.predict() :'( . link to docs pageSoftcover
J
33

Per the SVC documentation, it looks like you need to change how you construct the SVC:

model = SVC(probability=True)

and then use the predict_proba method:

class_probabilities = model.predict_proba(sub_main)
Johan answered 30/6, 2015 at 4:58 Comment(4)
Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean?Implicatory
@Implicatory How did you interpreted the valuesCyanate
Is the probability same as confidence? While predict_proba returns the proability/likelihood of that observation belonging to that particular class. How can we find the confidence with which the likelihood is determinedEstevez
If you have time, can help with this related question. - stats.stackexchange.com/questions/560774/…Estevez
H
16

For those estimators implementing predict_proba() method, like Justin Peel suggested, You can just use predict_proba() to produce probability on your prediction.

For those estimators which do not implement predict_proba() method, you can construct confidence interval by yourself using bootstrap concept (repeatedly calculate your point estimates in many sub-samples).

Let me know if you need any detailed examples to demonstrate either of these two cases.

Honourable answered 30/6, 2015 at 7:8 Comment(7)
Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean?Implicatory
@Implicatory They are already in percentage form. :) The sum of each row should equal exactly to 1. The last element is actually 0.992 which means the algo predict it belongs to this class with prob 99.2%. Note e-03 is just scientific notation.Honourable
Ah I see now, thank you! :) I would have accepted your answer, but since Justin Peel commented first with the example that worked for me, I decided to give it to him, sorry about that but thanks for the advice!Implicatory
No problem at all. :) Glad that we both could help.Honourable
is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to?Crooked
@JianxunLi - Could you please elaborate on the second case where the predict_proba() method is not providedHarrow
Is the probability same as confidence? While predict_proba returns the proability/likelihood of that observation belonging to that particular class. How can we find the confidence with which the likelihood is determined?Estevez
S
0

using above code you will get 4 class names with predicted value for each sample. You can change no_of_class for as many as you need.

probas1 =model.predict_proba(sub_main)
no_of_class=4

top3_classes1 = np.argsort(-probas1, axis=1)[:, :no_of_class]

class_labels1 = rf.classes_[top3_classes1[i]] for i in range(len(top3_classes1))]

class_labels1

top_confidence1=[probas1[i][top3_classes1[i]] for i in range(len(top_classes1))]

for i in range(len(class_labels1)):

    for j in range(no_of_class):

        print(f"Sample {i}: {class_labels1[i][j]} :: {top_confidence1[i][j]}")

NOTE: you can simply also convert this into dataframe where you can add column of predicted class and in another column its predicted value

Smidgen answered 16/5, 2023 at 7:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.