Generate 'K' Nearest Neighbours to a datapoint

Asked 21/12, 2018 at 13:14 Answered 21/12, 2018 at 13:21

Solved python pandas scikit-learn knn nearest-neighbor

I need to generate K nearest neighbours given a datapoint. I read up the sklearn.neighbours module of sklearn but it generates neighbours between two sets of data. What I want is probably a list of 100 datapoints closest to the datapoint passed.

Any KNN algorithm shall anyways be finding these K datapoints under the hood. Is there any way these K points could be returned as output?

Here is my sample notebook.

Humeral answered 21/12, 2018 at 13:14 Comment(1)

Have a look at docs.scipy.org/doc/scipy/reference/generated/… – Looselimbed 21/12, 2018 at 13:17

from sklearn.neighbors import NearestNeighbors

This can give you the index of the k nearest neighbors in your dataset. use kneighbors, first value is the distance and second value is the index of the neighbors. From documentation:

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([[1., 1., 1.]])) 
(array([[0.5]]), array([[2]]))

Inept answered 21/12, 2018 at 13:21 Comment(0)

You don't need to look under the hood.

Use the kd-tree for nearest-neighbor lookup. Once, you have the index ready, you would query it for the k-NNs.

Ref example:

>>> from scipy import spatial
>>> x, y = np.mgrid[0:5, 2:8]
>>> tree = spatial.KDTree(list(zip(x.ravel(), y.ravel())))
>>> pts = np.array([[0, 0], [2.1, 2.9]])
>>> tree.query(pts)
(array([ 2.        ,  0.14142136]), array([ 0, 13]))
>>> tree.query(pts[0])
(2.0, 0)

Rescissory answered 21/12, 2018 at 13:18 Comment(0)

Recommended topics

Hot tags