Find all nearest neighbors within a specific distance
Asked Answered
H

1

20

I have a large list of x and y coordinates, stored in an numpy array.

Coordinates = [[ 60037633 289492298]
 [ 60782468 289401668]
 [ 60057234 289419794]]
...
...

What I want is to find all nearest neighbors within a specific distance (lets say 3 meters) and store the result so that I later can do some further analysis on the result.

For most packages I found it is necessary to decided how many NNs should be found but I just want all within the set distance.

How can I achieve something like that and what is the fastest and best way to achieve something like that for a large dataset (some million points)?

Homans answered 6/9, 2015 at 14:27 Comment(2)
Have you tried to do this yourself yet? What does your code look like right now? Can you give an example of what you are trying to compute (i.e. what does 3 meters mean)? Are these GPS coordinates?Layby
from scipy import spatial myTreeName=spatial.cKDTree(Coordinates,leafsize=100) for item in Coordinates: TheResult=myTreeName.query(item,k=20,distance_upper_bound=3) Is what i tried before but here i have to specify how many nearest neighbors I want to find. Yes those are GPS coordinates(X,Y) and I want to find all NNs within a radius of 3 meters for every point in the dataset.Homans
R
30

You could use a scipy.spatial.cKDTree:

import numpy as np
import scipy.spatial as spatial
points = np.array([(1, 2), (3, 4), (4, 5)])
point_tree = spatial.cKDTree(points)
# This finds the index of all points within distance 1 of [1.5,2.5].
print(point_tree.query_ball_point([1.5, 2.5], 1))
# [0]

# This gives the point in the KDTree which is within 1 unit of [1.5, 2.5]
print(point_tree.data[point_tree.query_ball_point([1.5, 2.5], 1)])
# [[1 2]]

# More than one point is within 3 units of [1.5, 1.6].
print(point_tree.data[point_tree.query_ball_point([1.5, 1.6], 3)])
# [[1 2]
#  [3 4]]

Here is an example showing how you can find all the nearest neighbors to an array of points, with one call to point_tree.query_ball_point:

import numpy as np
import scipy.spatial as spatial
import matplotlib.pyplot as plt
np.random.seed(2015)

centers = [(1, 2), (3, 4), (4, 5)]
points = np.concatenate([pt+np.random.random((10, 2))*0.5 
                         for pt in centers])
point_tree = spatial.cKDTree(points)

cmap = plt.get_cmap('copper')
colors = cmap(np.linspace(0, 1, len(centers)))
for center, group, color  in zip(centers, point_tree.query_ball_point(centers, 0.5), colors):
   cluster = point_tree.data[group]
   x, y = cluster[:, 0], cluster[:, 1]
   plt.scatter(x, y, c=color, s=200)

plt.show()

enter image description here

Rangefinder answered 6/9, 2015 at 14:33 Comment(4)
I believe it's recommended to use spatial.cKDTree instead. (The only difference, I believe, is implementation... the behavior and interface is the same.)Letterperfect
Thanks for the correction, @askewchan. cKDTree should be faster.Rangefinder
O.k now if I want to make your query for a lot or points how could I store the found nearest points with there query point? So in your example something like: (1.5 : 1 2) (1.6: 3 4) Like having an Index, dictionaries or tuple or something like that?Homans
I added an example showing how to perform the query for an array of points.Rangefinder

© 2022 - 2024 — McMap. All rights reserved.