Closest group of 3 points
Asked Answered
B

5

24

Is there a known, efficient algorithm for finding the closest group of three points in a cloud?

This is similar to the closest pair of points problem but I am looking for three points instead of two.


Edit
The definition of "closest" will affect the complexity of the algorithm. As Jack pointed out, finding the minimum area triangle is 3sum-hard and in any case not well suited to my application.

I am hoping there is a more efficient algorithm for finding the minimum perimeter (i.e. |AB|+|AC|+|BC|) triangle or something similar (e.g. minimum |AB|²+|AC|²+|BC|².) I know of no reason why this should be 3sum-hard as the existence of 3 colinear points elsewhere would not affect the result.


Note: my points have eight dimensions, so any algorithm that is restricted to fewer dimensions is not suitable.

Bostick answered 24/9, 2011 at 14:28 Comment(2)
By "efficient" I assume you mean something stronger than the polynomial-time (O(n^3) in the worst case) brute-force algorithm which checks all triples, right? Any algorithm will be Omega(n) in the best case, so there is some room to play around.Firehouse
@Patrick87, "efficient" was deliberately ambiguous, but now that Jack has posted his answer, my goal is to find something better than O(n² log n).Bostick
C
4

There is an obvious algorithm which works in O(n^2).

1) perform Delaunay triangluation - O(n log n), so we get a planar graph.

As it is Delaunay triangluation, which maximises the minimum angle, it is clear that the closest 3 points must be connected by at least 2 edges (but they don't necessarily need to form the triangle!). Otherwise there would be more closer points or more closed angles.

2) for each point (vertex of the graph), we examine each couple of adjacent edges and look the 3 vertices defined by these 2 edges.

How much time will the step 2) will take? Since the graph is planar, the number of edges is <= 3n - 6, i.e. O(n). So this step will take O(n^2) in the worst case (O(n) in the typical case, where the degree of each vertex is limited).

So the whole algorithm is O(n^2). Note that the step 2) is somehow naive (brute force) solution, so I think there is a space for improvement (also, the steps 1 and 2 could probably be merged somehow).

Commonable answered 30/9, 2011 at 14:51 Comment(3)
That's the kind of thing I was looking for. Thanks.Bostick
@finnw, you are welcome. I would however advise to test the theorem "closest 3 points must be connected by at least 2 edges" by comparing the with the naive O(n^3) solution on some random generated datasets. It looks obvious and intuitive, however I didn't prove it mathematically! I think it should be valid, if it is not, it will be at least very close approximation.Commonable
I assume your solution is based on "minimum perimeter". Could you please confirm this?Crusado
P
9

This problem is similar to the classical problem of finding the closest pair in a set of points. You can adapt the worst-case O(n log n) algorithms that solve the closest-pair problem to solve this one.

The specific problem was featured in Google's Code Jam competition. Here is a resume of the analysis:

The number of points can be pretty large so we need an efficient algorithm. We can solve the problem in O(n log n) time using divide and conquer. We will split the set of points by a vertical line into two sets of equal size. There are now three cases for the minimum-perimeter triangle: its three points can either be entirely in the left set, entirely in the right set, or it can use points from each half.

Further:

"To find the minimum perimeter for the third case (if it is less than p)":

  1. We select the points that are within a distance of p/2 from the vertical separation line.
  2. Then we iterate through those points from top to bottom, and maintain a list of all the points in a box of size p x p/2, with the bottom edge of the box at the most recently considered point.
  3. As we add each point to the box, we compute the perimeter of all triangles using that point and each pair of points in the box. (We could exclude triangles entirely to the left or right of the dividing line, since those have already been considered.)

We can prove that the number of points in the box is at most 16, so we only need to consider at most a small constant number of triangles for each point.

It seems to me you could even adapt the method to work in the |AB|²+|AC|²+|BC|² case.

Protractile answered 30/9, 2011 at 9:28 Comment(5)
If you read the 2d divide and conquer closest point algorithm, you should notice that it easily works for three points as well. Wikipedia then describes how it works on nlogn for any number of dimensions.Protractile
I do not see how it "easily" works for three points. The step where you look for 2 closer points on opposite sides of the dividing plane does not (trivially) generalise to 3 points.Bostick
It was in google code jam in 2009: code.google.com/codejam/contest/dashboard?c=311101#s=a&a=1Protractile
@finnw: I've tried to put more of the analysis in the actual answer.Protractile
This is both easier to implement and more efficient than the accepted answer.Deceptive
C
4

There is an obvious algorithm which works in O(n^2).

1) perform Delaunay triangluation - O(n log n), so we get a planar graph.

As it is Delaunay triangluation, which maximises the minimum angle, it is clear that the closest 3 points must be connected by at least 2 edges (but they don't necessarily need to form the triangle!). Otherwise there would be more closer points or more closed angles.

2) for each point (vertex of the graph), we examine each couple of adjacent edges and look the 3 vertices defined by these 2 edges.

How much time will the step 2) will take? Since the graph is planar, the number of edges is <= 3n - 6, i.e. O(n). So this step will take O(n^2) in the worst case (O(n) in the typical case, where the degree of each vertex is limited).

So the whole algorithm is O(n^2). Note that the step 2) is somehow naive (brute force) solution, so I think there is a space for improvement (also, the steps 1 and 2 could probably be merged somehow).

Commonable answered 30/9, 2011 at 14:51 Comment(3)
That's the kind of thing I was looking for. Thanks.Bostick
@finnw, you are welcome. I would however advise to test the theorem "closest 3 points must be connected by at least 2 edges" by comparing the with the naive O(n^3) solution on some random generated datasets. It looks obvious and intuitive, however I didn't prove it mathematically! I think it should be valid, if it is not, it will be at least very close approximation.Commonable
I assume your solution is based on "minimum perimeter". Could you please confirm this?Crusado
E
1

The problem you mentioned is variation of 3sum hard problem. Have a look at http://www.cs.mcgill.ca/~jking/papers/3sumhard.pdf for details.

This problem can be also expressed as finding smallest triangle from given points.

EDIT:

Essentially, 3sum hard problem means that lower bound is O(n^2). There might be small improvement here and there but nothing much can be done.

For this specific problem (smallest triangle), see chapter 3 of http://www.cs.princeton.edu/~chazelle/pubs/PowerDuality.pdf.

Enrollment answered 24/9, 2011 at 16:13 Comment(2)
I think smallest triangle will have smallest |AB| + |AC| + |BC|. Wouldn't it?Enrollment
Quote from the second paper: the vertices of the smallest triangle may be arbitrarily distant from one another, if they happen to be colinear. This is true for the minimum-area triangle, but not for the minimum-perimeter triangle, so there may be a more efficient algorithm for finding the minimum-perimeter triangle (because you can reject any pair of points whose distance is at least half the perimeter of any other triangle.)Bostick
M
1

This a specific form of the k-nearest neighbor problem, namely where k=3. The cited page does not specify algorithmic complexity, but it's fairly obvious to see that a naive approach of computing the distance from the selected point to all other points, then computing the distance from that point to all other points, as well as the distance from our original point to the newly selected point is O(nk-1). Consider the pseudocode:

for pointB in searchSpace do:
    distanceAB := calculateDistance(pointA, pointB)
    for pointC in {searchSpace} - {pointB} do:
        distanceBC := calculateDistance(pointB, pointC)
        distanceCA := calculateDistance(pointC, pointA)
        if (distanceAB + distanceBC + distanceCA) < currentMin then:
            currentMin := distanceAB + distanceBC + distanceCA
            closestPoints := {pointA, pointB, pointC}

Note that we assume that pointA has already been removed from searchSpace. This is an O(n2) algorithm. Assuming dimension is relatively small, or that the function calculateDistance grows linearly or less with the dimension, this gives the solution in a decent time constraint. Optimizations could certainly be had, but they may not be required.

If you want to see some real code, wikipedia has many links to implementations.

Migrate answered 29/9, 2011 at 18:55 Comment(1)
There is no "selected point." You must also choose pointA, so the naive approach would be O(n³). N is about 1000.Bostick
R
0

Thomas Ahle's Answer does a great job explaining the logic however it does not have the code for the same.

Here's a Java Implementation for the same

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.*;

public class MainClass {
    static class Point{
        double x;
        double y;
        Point(double x, double y){
            this.x = x;
            this.y =y;
        }
    }
    static class ComparatorByXCoordinate implements Comparator<Point>{
        @Override
        public int compare(Point P1, Point P2) {
            if (P1.x > P2.x) return  1;
            if (P1.x < P2.x) return -1;
            return Double.compare(P1.y, P2.y);
        }
    }

    static class ComparatorByYCoordinate implements Comparator<Point>{
        @Override
        public int compare(Point P1, Point P2) {
            return Double.compare(P1.y, P2.y);
        }
    }
    static double DistanceBetweenPoints(Point P1, Point P2){
        return Math.sqrt((P1.x-P2.x)*(P1.x-P2.x) + (P1.y-P2.y)*(P1.y-P2.y));
    }
    static double Perimeter(Point A, Point B, Point C){
        return DistanceBetweenPoints(A, B) + DistanceBetweenPoints(B, C) + DistanceBetweenPoints(C, A);
    }
    public static void main(String[] args) throws IOException {
        InputStreamReader r = new InputStreamReader(System.in);
        BufferedReader br = new BufferedReader(r);
        int n = Integer.parseInt(br.readLine());
        Point[] PointsX = new Point[n];
        Point[] PointsY = new Point[n];
        for (int i = 0; i < n; i++){
            String[] st = br.readLine().split(" ");
            double xCoordinate = Double.parseDouble(st[0]);
            double yCoordinate = Double.parseDouble(st[1]);
            Point p = new Point(xCoordinate, yCoordinate);
            PointsX[i] = p;
            PointsY[i] = p;
        }
        Arrays.sort(PointsX, new ComparatorByXCoordinate());
        Arrays.sort(PointsY, new ComparatorByYCoordinate());
        System.out.printf("%.12f", solveNoCross(PointsX, PointsY, 0, n));
    }
    static double solveNoCross(Point[] PointsX, Point[] PointY, int low, int high){
        if (high - low < 3){
            return Double.MAX_VALUE;
        }
        else if (high - low == 3){
            return  Perimeter(PointsX[low], PointsX[low+1], PointsX[low+2]);
        }
        int mid = low + (high-low)/2; //Overflow Issues :(
        Point midPoint = PointsX[mid];
        Point[] firstHalf = new Point[mid - low];
        Point[] secondHalf = new Point[high - mid];
        int firstHalfPointer = 0;
        int secondHalfPointer = 0;
        for (Point point: PointY){
            double pointX = point.x;
            double pointY = point.y;
            double midX = midPoint.x;
            double midY = midPoint.y;
            if (pointX < midX || (pointX == midX && pointY < midY)){
                firstHalf[firstHalfPointer++] = point;
            }
            else{
                secondHalf[secondHalfPointer++] = point;
            }
        }
        double min = Math.min(solveNoCross(PointsX, firstHalf, low, mid),
                solveNoCross(PointsX, secondHalf, mid, high));
        return Math.min(min, solveCross(PointY, midPoint, min, PointY.length));
    }

    private static double solveCross(Point[] pointY, Point midPoint, double min, int pointYLen) {
        // For Solving When There Are Cross Triangles Such That Some Vertices Are in One Half and Some in Other
        double midX = midPoint.x;
        double midY = midPoint.y;
        double MCP = Double.MAX_VALUE; // Minimum Cross Perimeter
        int boundingFactor = 0;
        double periRange;
        ArrayList<Point> pointsInPeriRange = new ArrayList<>();
        if (min == Double.MAX_VALUE){
            periRange = 2 * 1E9;
        }
        else{
            periRange = min/2;
        }
        for (int FirstPointIterator = 0; FirstPointIterator < pointYLen; FirstPointIterator++){
            Point Firstpoint = pointY[FirstPointIterator];
            double FirstPointX = Firstpoint.x;
            double FirstPointY = Firstpoint.y;
            if(Math.abs(FirstPointX - midX) > periRange) continue;
            while(boundingFactor < pointsInPeriRange.size() && FirstPointY - pointsInPeriRange.get(boundingFactor).y > periRange) {
                boundingFactor += 1;
            }
            for (int SecondPointIterator = boundingFactor; SecondPointIterator < pointsInPeriRange.size(); SecondPointIterator++){
                for (int ThirdPointIterator = SecondPointIterator + 1; ThirdPointIterator < pointsInPeriRange.size(); ThirdPointIterator++){
                    Point SecondPoint = pointsInPeriRange.get(SecondPointIterator);
                    Point ThirdPoint = pointsInPeriRange.get(ThirdPointIterator);
                    MCP = Math.min(MCP, Perimeter(Firstpoint, SecondPoint, ThirdPoint));
                }
            }
            pointsInPeriRange.add(Firstpoint);
        }
        return MCP;
    }
}

The input format is an integer denoting the number of points followed by the points!

Sample Run -

Input:

4
0 0
0 3
3 0
1 1

Output:

6.650281539873
Raybin answered 15/8, 2022 at 19:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.