What is the most efficient algorithm to find a straight line that goes through most points?
Asked Answered
R

9

49

The problem:

N points are given on a 2-dimensional plane. What is the maximum number of points on the same straight line?

The problem has O(N2) solution: go through each point and find the number of points which have the same dx / dy with relation to the current point. Store dx / dy relations in a hash map for efficiency.

Is there a better solution to this problem than O(N2)?

Resistencia answered 14/11, 2010 at 20:40 Comment(4)
Don't think it's possible. It would be possible if there would exist such a transformation on a single point that would help, but unfortunately any transformation I can think of requires 2 points. A probabilistic approach (like Monte Carlo) could be faster, but there would be no garantee that it found the maximum.Westhead
If you substitude point coordinates to line equation, k*x[i] + b = y[i], you'll get equation about k and x. In {k,x}-space it will be a line. So there it become a problem of maximum lines going through one point. It may have a solution.Cyclamen
Note that problem makes sense only for integer coodonates, so k and b have to be rational numbers, such that b == y - k*x , where y and x are integers. Maybe by transforming the problem in the form "find such rational numbers b and k that satisfies most equations" will help.Westhead
@leonid, do you think dx/dy is enough? I thought dx/dy is only the slope, but how about y-intercept?Leuco
K
45

There is likely no solution to this problem that is significantly better than O(n^2) in a standard model of computation.

The problem of finding three collinear points reduces to the problem of finding the line that goes through the most points, and finding three collinear points is 3SUM-hard, meaning that solving it in less than O(n^2) time would be a major theoretical result.

See the previous question on finding three collinear points.

For your reference (using the known proof), suppose we want to answer a 3SUM problem such as finding x, y, z in list X such that x + y + z = 0. If we had a fast algorithm for the collinear point problem, we could use that algorithm to solve the 3SUM problem as follows.

For each x in X, create the point (x, x^3) (for now we assume the elements of X are distinct). Next, check whether there exists three collinear points from among the created points.

To see that this works, note that if x + y + z = 0 then the slope of the line from x to y is

(y^3 - x^3) / (y - x) = y^2 + yx + x^2

and the slope of the line from x to z is

(z^3 - x^3) / (z - x) = z^2 + zx + x^2 = (-(x + y))^2 - (x + y)x + x^2 = x^2 + 2xy + y^2 - x^2 - xy + x^2 = y^2 + yx + x^2

Conversely, if the slope from x to y equals the slope from x to z then

y^2 + yx + x^2 = z^2 + zx + x^2,

which implies that

(y - z) (x + y + z) = 0,

so either y = z or z = -x - y as suffices to prove that the reduction is valid.

If there are duplicates in X, you first check whether x + 2y = 0 for any x and duplicate element y (in linear time using hashing or O(n lg n) time using sorting), and then remove the duplicates before reducing to the collinear point-finding problem.

Karnes answered 15/11, 2010 at 4:0 Comment(2)
What's so special about assuming points as (x, x^3)? Why doesn't the math work out for say, (x, x^2)?Delvecchio
I was about to ask about what guarantees that the y-intercept is also the same for these 3 lines (i.e., the answer only shows that the slope will be the same), but then it clicked: if two lines have the same slope and go through some common point, then they are necessarily the same line -- and that's the case with all 3 possible lines here (xy and xz have the same slope and share x, and yz has the same slope and shares z with xz).Diastyle
B
5

If you limit the problem to lines passing through the origin, you can convert the points to polar coordinates (angle, distance from origin) and sort them by angle. All points with the same angle lie on the same line. O(n logn)

I don't think there is a faster solution in the general case.

Brindisi answered 14/11, 2010 at 21:19 Comment(1)
Actually, if you limit the problem to lines passing through the origin (0,0), it can be solved in O(n) time, using a hash.Jovial
F
4

The Hough Transform can give you an approximate solution. It is approximate because the binning technique has a limited resolution in parameter space, so the maximum bin will give you some limited range of possible lines.

Frigid answered 15/11, 2010 at 5:42 Comment(0)
B
1

Again an O(n^2) solution with pseudo code. Idea is create a hash table with line itself as the key. Line is defined by slope between the two points, point where line cuts x-axis and point where line cuts y-axis.

Solution assumes languages like Java, C# where equals method and hashcode methods of the object are used for hashing function.

Create an Object (call SlopeObject) with 3 fields

  1. Slope // Can be Infinity
  2. Point of intercept with x-axis -- poix // Will be (Infinity, some y value) or (x value, 0)
  3. Count

poix will be a point (x, y) pair. If line crosses x-axis the poix will (some number, 0). If line is parallel to x axis then poix = (Infinity, some number) where y value is where line crosses y axis. Override equals method where 2 objects are equal if Slope and poix are equal.

Hashcode is overridden with a function which provides hashcode based on combination of values of Slope and poix. Some pseudo code below

Hashmap map;
foreach(point in the array a) {
    foeach(every other point b) {
        slope = calculateSlope(a, b);
        poix = calculateXInterception(a, b);
        SlopeObject so = new SlopeObject(slope, poix, 1); // Slope, poix and intial count 1.
        SlopeObject inMapSlopeObj = map.get(so);
        if(inMapSlopeObj == null) {
            inMapSlopeObj.put(so);
        } else {
            inMapSlopeObj.setCount(inMapSlopeObj.getCount() + 1);
        }
    }
}
SlopeObject maxCounted = getObjectWithMaxCount(map);
print("line is through " + maxCounted.poix + " with slope " + maxCounted.slope);
Berkow answered 24/12, 2014 at 8:32 Comment(0)
D
1

Move to the dual plane using the point-line duality transform for p=(a,b) p*:y=a*x + b. Now using a line sweep algorithm find all intersection points in NlogN time. (If you have points which are one above the other just rotate the points to some small angle). The intersection points corresponds in the dual plane to lines in the primer plane.

Damson answered 28/1, 2021 at 14:54 Comment(2)
"intersection points" of what with what, please?Gadgeteer
Of the lines in the dual plane. See page 9 for example in the link. graphics.stanford.edu/courses/cs268-16-fall/Notes/…Damson
D
1

Whoever said that since 3SUM have a reduction to this problem and thus the complexity is O(n^2). Please note that the complexity of 3SUM is less than that. Please check https://en.wikipedia.org/wiki/3SUM and also read https://tmc.web.engr.illinois.edu/reduce3sum_sosa.pdf

Damson answered 4/2, 2021 at 12:23 Comment(1)
It's unclear what you are trying to say. Especially the second sentence is uncomplete.Coincidentally
S
0

As already mentioned, there probably isn't a way to solve the general case of this problem better than O(n^2). However, if you assume a large number of points lie on the same line (say the probability that a random point in the set of points lie on the line with the maximum number of points is p) and don't need an exact algorithm, a randomized algorithm is more efficient.

maxPoints = 0
Repeat for k iterations:
    1. Pick 2 random, distinct points uniformly at random
    2. maxPoints = max(maxPoints, number of points that lies on the 
       line defined by the 2 points chosen in step 1)

Note that in the first step, if you picked 2 points which lies on the line with the maximum number of points, you'll get the optimal solution. Assuming n is very large (i.e. we can treat the probability of finding 2 desirable points as sampling with replacement), the probability of this happening is p^2. Therefore the probability of finding a suboptimal solution after k iterations is (1 - p^2)^k.

Suppose you can tolerate a false negative rate rate = err. Then this algorithm runs in O(nk) = O(n * log(err) / log(1 - p^2)). If both n and p are large enough, this is significantly more efficient than O(n^2). (i.e. Supposed n = 1,000,000 and you know there are at least 10,000 points that lie on the same line. Then n^2 would required on the magnitude of 10^12 operations, while randomized algorithm would require on the magnitude of 10^9 operations to get a error rate of less than 5*10^-5.)

Steading answered 18/6, 2016 at 18:17 Comment(0)
K
-1

It is unlikely for a $o(n^2)$ algorithm to exist, since the problem (of even checking if 3 points in R^2 are collinear) is 3Sum-hard (http://en.wikipedia.org/wiki/3SUM)

Krute answered 3/2, 2014 at 19:3 Comment(0)
J
-3

This is not a solution better than O(n^2), but you can do the following,

  1. For each point convert first convert it as if it where in the (0,0) coordinate, and then do the equivalent translation for all the other points by moving them the same x,y distance you needed to move the original choosen point.

2.Translate this new set of translated points to the angle with respect to the new (0,0).

3.Keep stored the maximum number (MSN) of points that are in each angle.

4.Choose the maximum stored number (MSN), and that will be the solution

Jourdan answered 14/11, 2010 at 20:40 Comment(5)
But that still O(n^2), and is more complicated to implement than the OP's method.Destefano
you go over each point once, that is one n, then you have to sort and count the same angle points, is that considered another n? I am not sure of this please excuse me.Jourdan
in 1. "for each point" "for all the other points", means applying an operator on n-1 points n times. That is O(n^2).Stalker
Now I understood it clearly Matthieu, thanks a lot for the explanationJourdan
This answer should explain that it is not an improvement over OP's answer. It is misleading.Jovial

© 2022 - 2024 — McMap. All rights reserved.