Use sklearn.neighbors
from sklearn.neighbors import NearestNeighbors
#example dataset
coords_vect = np.vstack([np.sin(range(10)), np.cos(range(10))]).T
knn = NearestNeighbors(n_neighbors=3)
knn.fit(coords_vect)
distance_mat, neighbours_mat = knn.kneighbors(coords_vect)
The code above finds nearest neighbors in a simple example dataset of 10 points which are located on a unit circle. The following explains the nearest neighbor algorithm results for this dataset.
Results explained:
neighbours_mat = array([[0, 6, 7],
[1, 7, 8],
[2, 8, 9],
[3, 9, 4],
[4, 3, 5],
[5, 6, 4],
[6, 0, 5],
[7, 1, 0],
[8, 2, 1],
[9, 3, 2]], dtype=int64)
The values of the result matrix, neighbours_mat
, are indices of the elements (rows) in the input vector, coords_vect
. In our example, reading the first row of the result neighbours_mat
, the point in index 0 of coords_vect
is closest to itself (index 0) then to the point in the index 6 of coords_vect
, and then to the point in index 7 -> this can be verified with "The coordinates of the input vector coords_vect
" plot below. The second raw in the result neighbours_mat
indicates that the point in index 1 is closest to itself, then to the point in index 7, then to the point in index 8, and so on.
Notes: The first column in the result neighbours_mat
is the node we measure distances from, the second column is its nearest neighbor the third column is the second nearest neighbor. You can get more neighbours by increasing n_neighbors
@ NearestNeighbors(n_neighbors=3)
initialization. The distance_mat
are the distances of each node from its neighbors, notice that every node has distance of 0 to itself, therefore the first column is always zeros:
distance_mat = array([[0. , 0.28224002, 0.70156646],
[0. , 0.28224002, 0.70156646],
[0. , 0.28224002, 0.70156646],
[0. , 0.28224002, 0.95885108],
[0. , 0.95885108, 0.95885108],
[0. , 0.95885108, 0.95885108],
[0. , 0.28224002, 0.95885108],
[0. , 0.28224002, 0.70156646],
[0. , 0.28224002, 0.70156646],
[0. , 0.28224002, 0.70156646]])
Plotting the points:
x_vect, y_vect = np.sin(range(10)), np.cos(range(10))
plt.figure()
plt.scatter(x_vect, y_vect)
for x,y, label in zip (x_vect, y_vect, range(number_of_points)):
plt.text(x,y,label)
The coordinates of the input vector coords_vect
:
scipy.spatial
module – Commissariatscikit-learn NearestNeighbors
, then you can choose amongkd_tree
,ball_tree
, andbrute force
. The default is auto: attempt to decide the most appropriate algorithm based on the values passed to fit method. See my example below. – Koestler