Grouping set of points to nearest pairs

Asked 27/5, 2015 at 17:7 Answered 28/5, 2015 at 15:11

algorithm computational-geometry nearest-neighbor

I need an algorithm for the following problem:

I'm given a set of 2D points P = { (x_1, y_1), (x_2, y_2), ..., (x_n, y_n) } on a plane. I need to group them in pairs in the following manner:

Find two closest points (x_a, y_a) and (x_b, y_b) in P.
Add the pair <(x_a, y_a), (x_b, y_b)> to the set of results R.
Remove <(x_a, y_a), (x_b, y_b)> from P.
If initial set P is not empty, go to the step one.
Return set of pairs R.

That naive algorithm is O(n^3), using faster algorithm for searching nearest neighbors it can be improved to O(n^2 logn). Could it be made any better?

And what if the points are not in the euclidean space?

An example (resulting groups are circled by red loops):

enter image description here

Objectify answered 27/5, 2015 at 17:7 Comment(11)

Could you elaborate a bit more on what exactly the output should be, i.e. can you give a sample input-output case? – Belaud 27/5, 2015 at 17:13

PriorityQueue<Distance> ? – Nonconformity 27/5, 2015 at 17:24

@Nonconformity Could you elaborate it in more detail? – Objectify 27/5, 2015 at 17:26

Actually, it's still N^2 log N, but you can basically dump all the distances into a priority queue, and then read out the closest pairs. – Nonconformity 27/5, 2015 at 17:27

@Nonconformity But I may get some points repeated in different pairs, and my requirement is that every point should be exactly in one pair. – Objectify 27/5, 2015 at 17:29

Use a Set<Point> to keep track of seen items. Set<T> has O(1) lookup time. Or you could just use a bitset or an array or something (since you know how many points you're working with), which won't require amortization. – Nonconformity 27/5, 2015 at 17:30

related: #22483188 – Watersoak 27/5, 2015 at 17:34

@Nonconformity Ok, this approach will work but I am curious can we be better than that : ). – Objectify 27/5, 2015 at 17:34

This seems worth looking into as well: #1602664 – Nonconformity 27/5, 2015 at 17:42

@Nonconformity I've just mentioned that algorithm in my question, but it still O(n^2 log n)... – Objectify 27/5, 2015 at 17:46

Yeah, I just read through it, and I see it's actually just for a single pair of points. I'll have to research this question some more. Definitely interesting. – Nonconformity 27/5, 2015 at 17:50

Put all of the points into an http://en.wikipedia.org/wiki/R-tree (time O(n log(n))) then for each point calculate the distance to its nearest neighbor. Put points and initial distances into a priority queue. Initialize an empty set of removed points, and an empty set of pairs. Then do the following pseudocode:

while priority_queue is not empty:
    (distance, point) = priority_queue.get();
    if point in removed_set:
        continue
    neighbor = rtree.find_nearest_neighbor(point)
    if distance < distance_between(point, neighbor):
        # The previous neighbor was removed, find the next.
        priority_queue.add((distance_between(point, neighbor), point)
    else:
        # This is the closest pair.
        found_pairs.add(point, neighbor)
        removed_set.add(point)
        removed_set.add(neighbor)
        rtree.remove(point)
        rtree.remove(neighbor)

The slowest part of this is the nearest neighbor searches. An R-tree does not guarantee that those nearest neighbor searches will be O(log(n)). But they tend to be. Furthermore you are not guaranteed that you will do O(1) neighbor searches per point. But typically you will. So average performance should be O(n log(n)). (I might be missing a log factor.)

Religious answered 27/5, 2015 at 18:18 Comment(1)

Awesome. I need to look this up. – Nonconformity 27/5, 2015 at 18:30

This problem calls for a dynamic Voronoi diagram I guess.

When the Voronoi diagram of a point set is known, the nearest neighbor pair can be found in linear time.

Then deleting these two points can be done in linear or sublinear time (I didn't find precise info on that).

So globally you can expect an O(N²) solution.

Nickey answered 28/5, 2015 at 8:26 Comment(0)

If your distances are arbitrary and you can't embed your points into Euclidean space (and/or the dimension of the space would be really high), then there's basically no way around at least a quadratic time algorithm because you don't know what the closest pair is until you check all the pairs. It is easy to get very close to this, basically by sorting all pairs according to distance and then maintaining a boolean look up table indicating which points in your list have already been taken, and then going through the list of sorted pairs in order and adding a pair of points to your "nearest neighbors" if neither point in the pair is in the look up table of taken points, and then adding both points in the pair to the look up table if so. Complexity O(n^2 log n), with O(n^2) extra space.

Pasco answered 27/5, 2015 at 23:13 Comment(0)

You can find the closest pair with this divide and conquer algorithm that runs in O(nlogn) time, you may repeat this n times and you will get O(n^2 logn) which is not better than what you got.

Nevertheless, you can exploit the recursive structure of the divide and conquer algorithm. Think about this, if the pair of points you removed were on the right side of the partition, then everything will behave the same on the left side, nothing changed there, so you just have to redo the O(logn) merge steps bottom up. But consider that the first new merge step will be to merge 2 elements, the second merges 4 elements then 8, and then 16,.., n/4, n/2, n, so the total number of operations on these merge steps are O(n), so you get the second closest pair in just O(n) time. So you repeat this n/2 times by removing the previously found pair and get a total O(n^2) runtime with O(nlogn) extra space to keep track of the recursive steps, which is a little better.

But you can do even better, there is a randomized data structure that let you do updates in your point set and get an expected O(logn) query and update time. I'm not very familiar with that particular data structure but you can find it in this paper. That will make your algorithm O(nlogn) expected time, I'm not sure if there is a deterministic version with similar runtimes, but those tend to be way more cumbersome.

Orbital answered 28/5, 2015 at 15:11 Comment(0)

Recommended topics

Hot tags