How could I utilize algorithms for nearest neighbor search to do fixed radius search?
Asked Answered
B

4

6

There are lots of works for the nearest neighbor search problem, so I wonder if I want to do fixed radius range search, could I utilize those algorithms for nearest neighbor search?

perhaps I could do kth-nearest-neighbor search over and over again until I find a point beyond the radius range, but I suppose it might cause a lot of waste.

Boatload answered 23/4, 2015 at 3:30 Comment(3)
3 answers, long silence from Alaya!Discontent
@G.Samaras, actually I find your answer is good and am reading some papers about LSH(and wondering whether to use it for just 3D problem)...Boatload
@G.Samaras and sorry for not replying...Boatload
D
4

Not only "There are lots of works for the nearest neighbor search problem," but there are many questions to be asked for your problem. The most important is the number of dimensions.

Make sure you check my answer if you are not sure why the dimension is important.


High dimensional space

Assuming your points lie in a high dimensional space, you should go for Locality-Sensitive Hashing (LSH) . A nice implementation of this algorithm is E²LSH. They also provide slides, if you want to implement it yourself or get a better understanding of what happens. Note that E²LSH solves a randomized version of the R-near neighbor problem, which they call (R, 1 − δ)-near neighbor problem, where δ has to do with the approximation, as they mention in the manual.

You can also check my answer regarding LSH here. I have used it in C++ and I strongly recommend it for the fixed radius search you want to perform!


Low dimensional space

Use CGAL's spatial searching. I have used it many times for this case in C++. Again, if you want to implement yourselves, you can find plenty information at the nice documentation they have and do relative google queries.


Nice answer by the way, so you got my +1. :)

Discontent answered 23/4, 2015 at 14:59 Comment(0)
W
2

If you have only one query, then this problem is fixed O(n), where n is number of points no matter what.

If you have multiple queries, than this problem is well explored problem, however it's solution is more complex than just nearest neighbor search. Refer to this article: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.3191&rep=rep1&type=pdf

Wantage answered 23/4, 2015 at 9:31 Comment(5)
In that case, refer to that article, which has extensive analysis of possible algorithms.Wantage
If you have a single query. You will have to go over all points, at least to see what their values are.Wantage
That's brute force, you can do better than that. Riko next time, when you reply to a comment, you may want to tag the other guy, so that he receives a notification. Use @otherguyDiscontent
@G.Samaras no, for one query you can't do better than that, since you need to iterate over all points once, just to get their variablesWantage
That's why building an index helps. See my answer.Discontent
A
2

Taking d as the number of dimensions and r as the radius, I know at least two different approaches you can use:

Using a spatial hash:

Imagine the problem space divided in hypercubes by a grid of size s such that s = r / sqrt(d).

The interesting thing about s = r / sqrt(d) is that any two points inside a hypercube of that size are guaranteed to be at a distance equal or less than r.

You can numerate the grid divisions so that every hypercube can be identified by the tuple formed by the indexes of one of its corners. These index tuples can be used as the keys into a hash structure (the spatial hash).

Now, and this is the difficult part, you have to notice that for any hypercube of the grid A there is a set of neighbor hypercubes N = (N1, N2, ...) whose minimal distance to the given hypercube is equal or less than the given radius, geometrically they look like an hypersphere. You can represent the elements of the set N as the deltas of the grid indexes relatives to A. Note that these index deltas depend on d only.

For instance, in the bi-dimensional case, you have the following geometrical structure:

   NNN    index deltas:    (-1,  2) ( 0,  2) ( 1,  2)
  NNNNN           (-2,  1) (-1,  1) ( 0,  1) ( 1,  1) ( 2,  1)
  NNANN           (-2,  0) (-1,  0) ( 0,  0) ( 1,  0) ( 2,  0)
  NNNNN           (-2, -1) (-1, -1) ( 0, -1) ( 1, -1) ( 2, -1)
   NNN                     (-1, -2) ( 0, -2) ( 1, -2)

So, you already have all you need to implement an efficient search for the points inside a hypersphere:

  1. Push the input points into the spatial hash using the index tuple of the hypercube containing them as the key.

  2. Given a point p the way to search is to determine the hypercube where it lays A, and then using the set of index-deltas, get the set of neighbor hypercubes N that can potentially contain points nearer than r.

  3. Retrieve from the spatial hash the points belonging to the hypercubes N, and check which ones are near enough to p.

There are some extra optimizations that can be carried out, as not checking the points in A as they are guaranteed to be all near enough. A pre-filtering of N can be performed based on the position of p relative to A.

Note that selecting s = r / sqrt(d) provides a nice compromise between having small hypercubes and a not too big set N.

Using a k-d tree or quad/octo/...-tree:

Every time you go down one level on the tree, you check if the space it represents intersects with the query hyper-sphere. If it does, you keep going down on it, otherwise you discard it completelly from the search.

Anking answered 23/4, 2015 at 11:8 Comment(4)
seems that using grid is the fastest way, however, I doubt that it would require huge memory usage when search radius is relatively small or when dimension is high.Boatload
Worst case memory usage happens when every point lays in a different hypercube and you end with a spatial hash with as many entries as points. So, no huge memory usage. On the other hand, the size of the set N grows exponentially with the dimension, resulting in a computational cost that is O(f(M)*exp(d)), where M is the number of points and f some undetermined function.Anking
Shouldn't the grid size be set to r instead of r/sqrt(d)? Because with r you only have to search in the 8 adjacent hypercubes plus in the query hypercube, that is in a total of 9 hypercubes, while with r/sqrt(d) you have to search in the 8 adjacent hypercubes plus 12 other hypercubes that are adjacent to them, that is in a total of 20 hypercubes. The only advantage of r/sqrt(d) is not having to search inside the query hypercube, but the drawbacks far outweighs it (20 vs 9 hypercubes).Meldameldoh
@Maggyero: what you actually need to consider is the hypervolumen covered by the set of hypercubes. In the 2D case: 20 * (r / sqrt(2))**2 = 10 * r**2 against 9 * r**2, so it is not that bad. For bigger dimensions, the balance gets better for the r/sqrt(d) case.Anking
M
0

A simple solution is using query_radius function in sklearn.neighbors.KDTree:

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree.query_radius

The format is:

query_radius(X, r, return_distance=False, count_only=False, sort_results=False)
Madriene answered 27/5, 2020 at 17:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.