Probability prediction method of KNeighborsClassifier returns only 0 and 1 - McMap

About

Probability prediction method of KNeighborsClassifier returns only 0 and 1

Asked 7/5, 2016 at 13:30 Answered 7/5, 2016 at 13:35

Solved machine-learning scikit-learn probability nearest-neighbor

C

1

12

Can anyone tell me what's the problem with my code? Why I can predict probability of iris dataset by using LinearRegression but, KNeighborsClassifier gives me 0 or 1 while it should give me a result like the one LinearRegression yields?

from sklearn.datasets import load_iris
from sklearn import metrics

iris = load_iris()
X = iris.data
y = iris.target

for train_index, test_index in skf:
    X_train, X_test = X_total[train_index], X_total[test_index]
    y_train, y_test = y_total[train_index], y_total[test_index]

from sklearn.linear_model import LogisticRegression
ln = LogisticRegression()
ln.fit(X_train,y_train)

ln.predict_proba(X_test)[:,1]

array([ 0.18075722, 0.08906078, 0.14693156, 0.10467766, 0.14823032, 0.70361962, 0.65733216, 0.77864636, 0.67203114, 0.68655163, 0.25219798, 0.3863194 , 0.30735105, 0.13963637, 0.28017798])

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5, algorithm='ball_tree', metric='euclidean')
knn.fit(X_train, y_train)

knn.predict_proba(X_test)[0:10,1]

array([ 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])

Cassandracassandre answered 7/5, 2016 at 13:30 Comment(1)

Regression != Classification. Not all classifiers support the concept of probability! – Corundum 10/5, 2016 at 17:1

N

15

Because KNN has very limited concept of probability. Its estimate is simply fraction of votes among nearest neighbours. Increase number of neighbours to 15 or 100 or query point near the decision boundary and you will see more diverse results. Currently your points are simply always having 5 neighbours of the same label (thus probability 0 or 1).

Norther answered 7/5, 2016 at 13:35 Comment(3)

But then my accuracy decreases because I'll go far from the optimal K. How come in weka, with the same K, we can get a more curvy ROC while here (scikit) the ROC is very sharp? – Cassandracassandre 7/5, 2016 at 13:46

KNN is a heuristic and has a lot of parameters. It is very probably that your results will differ. You have too look up the default values of used metrics and algorithms. And maybe even the ROC-curve evaluation is done differently! There is also randomness involved (in KNN)! – Corundum 10/5, 2016 at 17:3

Probabilities output would be more precise if use of the option "weighted = distances" – Concealment 9/4, 2019 at 16:21

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.