All k nearest neighbors in 2D, C++

Asked 13/11, 2010 at 11:50 Answered 16/11, 2010 at 11:13

c++algorithm nearest-neighbor large-data

I need to find for each point of the data set all its nearest neighbors. The data set contains approx. 10 million 2D points. The data are close to the grid, but do not form a precise grid...

This option excludes (in my opinion) the use of KD Trees, where the basic assumption is no points have same x coordinate and y coordinate.

I need a fast algorithm O(n) or better (but not too difficult for implementation :-)) ) to solve this problem ... Due to the fact that boost is not standardized, I do not want to use it ...

Thanks for your answers or code samples...

Geranial answered 13/11, 2010 at 11:50 Comment(4)

Could you provide an example for what you looking for ? – Muscovado 13/11, 2010 at 12:3

possible duplicate of Suitable choice of data structure and algorithm for fast k-Nearest Neighbor search in 2D – Bicuspid 13/11, 2010 at 12:42

I don't quite follow why you can't use kd-trees. I'll summarize what I think you are saying: let me know where I am going wrong. You have a set of 10M distinct points. They don't lie on an integer grid, but are close, e.g., there is a point (2.01, 1.05) and another (1.99,1.03). Could you not scale the points so that they all lay on an integer grid, and then use kd-trees? e.g., the 2 points above could be (201,105) and (199,103). – Lifesaver 13/11, 2010 at 12:52

Some points are on the grid, some not and rest of them is close to grid... – Geranial 13/11, 2010 at 13:6

I would do the following:

Create a larger grid on top of the points.
Go through the points linearly, and for each one of them, figure out which large "cell" it belongs to (and add the points to a list associated with that cell).

(This can be done in constant time for each point, just do an integer division of the coordinates of the points.)
Now go through the points linearly again. To find the 10 nearest neighbors you only need to look at the points in the adjacent, larger, cells.

Since your points are fairly evenly scattered, you can do this in time proportional to the number of points in each (large) cell.

Here is an (ugly) pic describing the situation:

enter image description here

The cells must be large enough for (the center) and the adjacent cells to contain the closest 10 points, but small enough to speed up the computation. You could see it as a "hash-function" where you'll find the closest points in the same bucket.

(Note that strictly speaking it's not O(n) but by tweaking the size of the larger cells, you should get close enough. :-)

Correctitude answered 13/11, 2010 at 11:57 Comment(5)

Not just adjacent, unfortunately (consider that points in the cell two to the east may be closer than points in the cell directly north-east, for instance; this problem gets much worse in higher dimensions). Also, what if the neighbouring cells happen to have less than 10 points in them? In practice, you will need to "spiral out". – Jewelry 13/11, 2010 at 12:7

Not in this particular case: The data are close to the grid, but do not form a precise grid.... By choosing large enough cells, you can solve it like this. – Correctitude 13/11, 2010 at 12:8

@Ian, sounds similar to this approach. – Correctitude 13/11, 2010 at 13:36

LHS, geometric hashing, quadtree, kd-tree, k-means tree, R-Tree are faster than your algorithm. – Epistaxis 14/1, 2011 at 13:39

So you down-voted my answer, because it is not the best possible? – Correctitude 14/1, 2011 at 13:41

I have used a library called ANN (Approximate Nearest Neighbour) with great success. It does use a Kd-tree approach, although there was more than one algorithm to try. I used it for point location on a triangulated surface. You might have some luck with it. It is minimal and was easy to include in my library just by dropping in its source.

Good luck with this interesting task!

Binford answered 16/11, 2010 at 11:13 Comment(0)

Recommended topics

Hot tags