Finding largest subset of points forming a convex polygon
Asked Answered
D

4

16

I'm looking for an algorithm for finding largest subset of points (by largest i mean in number) that form a convex polygon from the given set of point. I think this might be solvable using DP but i'm not sure. Is it possible to do this in O(n^3) ? Actually i just need the size of the largest subset, so it doesn't need to have unique solution

Edit :

just to keep this simple,

Given input : a set of points in 2D

Desired output : maximum number of points that form a convex polygon, like in the example the output is 5 (ABHCD is one of the possible convex polygon)

an example

There's a similar problem spoj.com/problems/MPOLY which is solvable using DP in O(N^3), my question is about the generalization of that problem which doesn't have to contain (0,0)

Dandle answered 14/2, 2014 at 11:44 Comment(7)
Wouldn't that be a circle? Or is that, the largest subset of points given a set of points?Elater
I edited my question a little, maybe it will help understand my questionDandle
Is finding all possible convex polygons an option?Elater
As in finding all possible combinations, and then searching your list for the largest.Elater
I'm looking for a polynomial solution since N could be up to 250, thanks for your idea thoughDandle
with your graphic, is the polynomial A-D-F-G-A allowed?Un
If you are talking about polygon ADFG yes, but it's only 4 in size then it's not the maximum solutionDandle
S
4

This problem has been already published in these competitions:

And its solution (O(N3) algorithm) could be found on this page: "USACO DEC08 Problem 'fence' Analysis" by Bruce Merry and Jacob Steinhardt.

The following is an attempt to explain this algorithm. Also here is my implementation of this algorithm in C++11. This code works for N<=250 and integer coordinates in range 0 .. 1023. No three points should be on the same line. Here is Python script that generates test data satisfying these requirements.

O(N2) algorithm for simplified problem

Simplified problem: find largest subset of points that form a convex polygon and contain the leftmost point of given set (or if there are several leftmost points, this polygon should contain topmost of them).

To solve this problem we could connect each pair of points by directed line segment, then (implicitly) treat each segment as a graph node as shown in diagram:

representing point set as a graph

Here parent node and corresponding segment have blue color. Line segment corresponding to its left child (red color) starts at the end of "parent" segment and it is the line segment making least possible left turn relative to the direction of "parent" segment. Line segment corresponding to its right child (green color) starts at the start of "parent" segment and it is also the line segment making least possible left turn relative to the direction of "parent" segment.

Any segment ending at leftmost point corresponds to a "leaf" node, i.e. it has no child nodes. Such construction of the graph guarantees that there are no cycles, in other words this graph is a DAG.

Now to find largest convex subset of points it is enough to find the longest path in this graph. And the longest path in DAG could be found with dynamic programming algorithm in time O(E), where E is number of edges in the graph. Since each node has only 2 outgoing edges, each corresponding to a pair of points, this problem could be solved in O(N2) time.

All this is possible to implement if we preprocess input data, sorting directed line segments starting at the same point by angle. This allows to perform depth-first traversal in the graph. We should memorize the longest path starting from each processed node, so that each graph edge is visited at most once, and we have O(E) = O(N2) time algorithm (ignoring preprocessing time for now). Space requirements are also O(N2) because we need to store sorted directions for each pair of points and memorization uses one value per node (which is also a pair of points).

Here is C++ implementation of this dynamic programming algorithm:

unsigned dpStep(ind_t i_left, ind_t from, ind_t to_ind)
{
    ind_t child = sorted[from][to_ind];

    while (child < i_left)
        child = sorted[from][++to_ind];

    if (child == i_left)
        return 1;

    if (auto v = memorize[from][child])
        return v;

    const auto x = max(dpStep(i_left, child, lefts[from][child]) + 1,
                       dpStep(i_left, from, static_cast<ind_t>(to_ind + 1)));

    memorize[from][child] = static_cast<ind_t>(x);
    return x;
}

Input parameters are the leftmost point and a pair of points forming current line segment (where starting point of the segment is given directly but ending point is given as an index in sorted by angle array of points). The word "left" in this code fragment is slightly overused: it means both the leftmost point (i_left) and preprocessed array containing left childs for the graph (lefts[][]).

O(N3) algorithm

General problem (where the leftmost point is not fixed) could be solved this way:

  1. Sort the points in left-right direction
  2. Preprocess the points to get two arrays: (a) each point sorted by direction relative to each other point and (b) position in first array of the end of line segment making least possible left turn relative to the direction of "parent" segment.
  3. Use each point as the leftmost point and find the longest convex polygon with "simplified" algorithm.
  4. Periodically prune all points to the left of the "leftmost" point from preprocessed arrays.
  5. Take the longest of paths found on step 3.

Step 4 is optional. It does not improve O(N3) time complexity. But it slightly improves speed of DP algorithm by excluding unneeded points. In my tests this gives 33% speed boost.

There are several alternatives for preprocessing. We could compute every angle (with atan2) and sort them, as it is done in sample code by Bruce Merry and Jacob Steinhardt. This is a simplest way but atan2 may be too expensive, especially for smaller point sets.

Or we could exclude atan2 and sort tangents instead of angles, as it is done in my implementation. This is a little bit more complicated but faster.

Both these alternatives need O(N2 log N) time (unless we use some distribution sorting). Third alternative is to use well known computational geometry methods (arrangements and duality) to get O(N2) preprocessing. But I don't think it is useful for our problem: it is probably too slow for smaller point sets because of large constant factor, as for larger point sets, it might give some speed improvement, but too insignificant because step 3 will greatly outweigh step 2. Also it is much more difficult to implement.

Some other results: I tried to implement iterative DP instead of recursion; this did not noticeably change the speed. Also I tried to perform two DP searches at once, interleaving steps of each search (something similar to fibers or HyperThreading implemented in software); this might help because DP consists mostly of memory accesses having high latency and preventing full utilization of CPU throughput; the results are not very impressive, only about 11% speed boost, most likely with real HyperThreading it would be much faster.

Sheasheaf answered 13/5, 2014 at 19:27 Comment(6)
I am sorry to bother but I am unable to comprehend one thing on the given URL cerberus.delos.com:790/TESTDATA/DEC08.fence.htm It's written as 'the first and last two points in the hull are sufficient to check that the next point doesn't break the convexity condition'. Can you please explain it? Why all the points are not required for that? Are the points arranged in particular order?Thug
@Naman: It looks clear enough. I could only try to explain it in different words. On each DP step there are 3 groups of points: (a) 4 points mentioned (first/last pairs), (b) points already in the hull (they are already chosen by DP algorithm, so they satisfy convexity condition and they are optimal), (c) all other points. On each step algorithm tries every point from group (c) independently and checks convexity condition for this point relative to points of group (a). If the points fits, there is no need to check its convexity relative to points of group (b), it is guaranteed to be satisfied.Sheasheaf
Somehow I am still unable to understand as why is it guaranteed. Can I please explain a case. Consider first pair (2,2), (3,1), last pair (8,2), (7,1) and points already in hull (6,6),(7,5). Now a new point (5,3) comes. It will succeed convexity condition from first and last pair but not when we consider it against all the points including in group (b). I know I am making some trivial mistake in understanding but I would really appreciate if you can correct me.Thug
@Naman: Now I understand the last part of your question. Yes, points are arranged. In your example first point in the pair (and first in sequence) is (3,1) and the last point is (7,1). So (5,3) cannot be inserted after (7,1). If you do this you'll get {(8,2), (7,1), (5,3), (3,1), (2,2)} which is not convex.Sheasheaf
So you mean before the given algorithm I need to sort/order the given points? If so order how(by x,y or clockwise?). I am sorry if I am asking silly questions. If you can help me with some link that has a detailed explanation, that would also be really helpful.Thug
@Naman: No, the points need not be presorted for this algorithm. But they are chosen by the algorithm sequentially. This determines their order. And this order may be different for different parts of DP array. I don't know where to find more detailed explanation. Probably you should just get (any) algorithms book to understand DP principle and (any) computational geometry book to read about convexity...Sheasheaf
E
4

I think I came up with an algorithm for it by extending Andrew's algorithm for convex hulls.

Originally, I came up with a O(N^4) algorithm, but noticed was way over-complicating it and brought it down to O(N^2) algorithm. It seems like there maybe a bug in the code somewhere that causes issues in at least the case of a vertical line of points. It might be a special case (requiring changing a few lines of code), or a sign of a larger flaw in the algorithm. If it's the latter, then I'm inclined to say it's NP-hard, and offer the algorithm as a heuristic. I've spent all the time I care to coding and debugging it. In any case it seems to work fine in other cases. However, it's difficult to test for correctness when the correct algorithms seem to be O(2^N).

def maximal_convex_hull(points):
    # Expects a list of 2D points in the format (x,y)


    points = sorted(set(points)) # Remove duplicates and sort
    if len(points) <= 1: # End early
        return points

    def cross(o, a, b): # Cross product
        return (a[0] - o[0]) * (b[1] - o[1]) - (a[1] - o[1]) * (b[0] - o[0])


    # Use a queue to extend Andrew's algorithm in the following ways:
    # 1. Start a new convex hull for each successive point
    # 2. Keep track of the largest convex hull along the way
    Q = []
    size = 0
    maximal_hull = None
    for i,p in enumerate(points):
        remaining = len(points) - i + 1
        new_Q = []
        for lower, upper in Q: # Go through the queue
            # Build upper and lower hull similtanously, slightly less efficent
            while len(lower) >= 2 and cross(lower[-2], lower[-1], p) < 0:
                lower.pop()
            lower.append(p)
            while len(upper) >= 2 and cross(upper[-2], upper[-1], p) > 0:
                upper.pop()
            upper.append(p)

            new_size = len(lower) + len(upper)
            if new_size > size: # Check for a new highest maximal convex hull
                size = new_size
                maximal_hull = (lower, upper)


        # There is an odd bug? that when one or both if statements are removed
        #  xqg237.tsp (TSPLIB) gives slightly different solutions and is
        #   missing a number of points it should have.
        #  Seems like if the opposite should be true if anything since they
        #   were intended to be easy optimizations not required code.
            if remaining + new_size >= size: # Don't bother if not enough
                new_Q.append((lower, upper)) # Add an updated convex hulls
        if remaining > size: # Don't bother if not enough
            new_Q.append(([p], [p])) # Add a new convex hull
        Q = new_Q # Update to the new queue

    # Extract and merge the lower and upper to a maximium convex hull
    # Only one of the last points is ommited because it is repeated
    #    Asserts and related variables used for testing
    #    Could just have "return lower[:-1] + list(reversed(upper[:-1]))[:-1]"
    lower, upper = maximal_hull
    max_hull_set = set(lower) | set(lower) | set(upper)
    lower = lower
    upper = list(reversed(upper[:-1]))[:-1]
    max_convex_hull = lower + upper
    assert len(lower) + len(upper) == len(max_hull_set)
    assert max_hull_set == set(max_convex_hull)
    return max_convex_hull

Edit: This algorithm isn't correct, but it served as inspiration for the correct algorithm and thus I'm leaving it here.

Ernaldus answered 17/2, 2014 at 17:37 Comment(10)
Thanks! i for the idea, by extending Andrew's algorithm, i get a O(N^4) algorithm using Dynamic Programming that i think doesn't have any flaw, I'm still have some doubt in your O(N^2) but i will check it out again in the weekend :)Dandle
Andrew's algorithm is O(N) (but requires a sort, making it O(N log N)), this algorithm runs O(N) versions of andrew's algorithm. Making for O(N^2) runtime. Although, the algorithm may not go far enough.Ernaldus
Yes, but i have some doubt in correctness of that algorithm, can you explain a little how the algorithm work since i'm not really familiar with phythonDandle
like in this picture imgur.com/DsJhF71, wasn't EFG pop-ed when C inserted, although the optimal upper hull is AEFGDDandle
What's the point set for that example? You might have a point that it should be considering the upper and lower hulls separately. Although that would still result in a O(N^2) algorithm, just with a larger hidden coefficients.Ernaldus
(3,3),(4,5),(4,4),(5,4),(6,4),(9,3),(7,6)Dandle
Alright, i guess i will take a look at your code more now, cause i think i have some trouble understanding how this worksDandle
I think i understand your code, but wasn't your code return 4 when ran ?Dandle
It does, and quick fixes didn't resolve the issue. This means that dynamic programming would be required for a solution if it's not NP-hard, since it requires not adding B and C to the convex hull. Do post your own solution if possible. I'm curious.Ernaldus
thanks for your help!, I posted my solution, please tell me if there's any flaw on itDandle
S
4

This problem has been already published in these competitions:

And its solution (O(N3) algorithm) could be found on this page: "USACO DEC08 Problem 'fence' Analysis" by Bruce Merry and Jacob Steinhardt.

The following is an attempt to explain this algorithm. Also here is my implementation of this algorithm in C++11. This code works for N<=250 and integer coordinates in range 0 .. 1023. No three points should be on the same line. Here is Python script that generates test data satisfying these requirements.

O(N2) algorithm for simplified problem

Simplified problem: find largest subset of points that form a convex polygon and contain the leftmost point of given set (or if there are several leftmost points, this polygon should contain topmost of them).

To solve this problem we could connect each pair of points by directed line segment, then (implicitly) treat each segment as a graph node as shown in diagram:

representing point set as a graph

Here parent node and corresponding segment have blue color. Line segment corresponding to its left child (red color) starts at the end of "parent" segment and it is the line segment making least possible left turn relative to the direction of "parent" segment. Line segment corresponding to its right child (green color) starts at the start of "parent" segment and it is also the line segment making least possible left turn relative to the direction of "parent" segment.

Any segment ending at leftmost point corresponds to a "leaf" node, i.e. it has no child nodes. Such construction of the graph guarantees that there are no cycles, in other words this graph is a DAG.

Now to find largest convex subset of points it is enough to find the longest path in this graph. And the longest path in DAG could be found with dynamic programming algorithm in time O(E), where E is number of edges in the graph. Since each node has only 2 outgoing edges, each corresponding to a pair of points, this problem could be solved in O(N2) time.

All this is possible to implement if we preprocess input data, sorting directed line segments starting at the same point by angle. This allows to perform depth-first traversal in the graph. We should memorize the longest path starting from each processed node, so that each graph edge is visited at most once, and we have O(E) = O(N2) time algorithm (ignoring preprocessing time for now). Space requirements are also O(N2) because we need to store sorted directions for each pair of points and memorization uses one value per node (which is also a pair of points).

Here is C++ implementation of this dynamic programming algorithm:

unsigned dpStep(ind_t i_left, ind_t from, ind_t to_ind)
{
    ind_t child = sorted[from][to_ind];

    while (child < i_left)
        child = sorted[from][++to_ind];

    if (child == i_left)
        return 1;

    if (auto v = memorize[from][child])
        return v;

    const auto x = max(dpStep(i_left, child, lefts[from][child]) + 1,
                       dpStep(i_left, from, static_cast<ind_t>(to_ind + 1)));

    memorize[from][child] = static_cast<ind_t>(x);
    return x;
}

Input parameters are the leftmost point and a pair of points forming current line segment (where starting point of the segment is given directly but ending point is given as an index in sorted by angle array of points). The word "left" in this code fragment is slightly overused: it means both the leftmost point (i_left) and preprocessed array containing left childs for the graph (lefts[][]).

O(N3) algorithm

General problem (where the leftmost point is not fixed) could be solved this way:

  1. Sort the points in left-right direction
  2. Preprocess the points to get two arrays: (a) each point sorted by direction relative to each other point and (b) position in first array of the end of line segment making least possible left turn relative to the direction of "parent" segment.
  3. Use each point as the leftmost point and find the longest convex polygon with "simplified" algorithm.
  4. Periodically prune all points to the left of the "leftmost" point from preprocessed arrays.
  5. Take the longest of paths found on step 3.

Step 4 is optional. It does not improve O(N3) time complexity. But it slightly improves speed of DP algorithm by excluding unneeded points. In my tests this gives 33% speed boost.

There are several alternatives for preprocessing. We could compute every angle (with atan2) and sort them, as it is done in sample code by Bruce Merry and Jacob Steinhardt. This is a simplest way but atan2 may be too expensive, especially for smaller point sets.

Or we could exclude atan2 and sort tangents instead of angles, as it is done in my implementation. This is a little bit more complicated but faster.

Both these alternatives need O(N2 log N) time (unless we use some distribution sorting). Third alternative is to use well known computational geometry methods (arrangements and duality) to get O(N2) preprocessing. But I don't think it is useful for our problem: it is probably too slow for smaller point sets because of large constant factor, as for larger point sets, it might give some speed improvement, but too insignificant because step 3 will greatly outweigh step 2. Also it is much more difficult to implement.

Some other results: I tried to implement iterative DP instead of recursion; this did not noticeably change the speed. Also I tried to perform two DP searches at once, interleaving steps of each search (something similar to fibers or HyperThreading implemented in software); this might help because DP consists mostly of memory accesses having high latency and preventing full utilization of CPU throughput; the results are not very impressive, only about 11% speed boost, most likely with real HyperThreading it would be much faster.

Sheasheaf answered 13/5, 2014 at 19:27 Comment(6)
I am sorry to bother but I am unable to comprehend one thing on the given URL cerberus.delos.com:790/TESTDATA/DEC08.fence.htm It's written as 'the first and last two points in the hull are sufficient to check that the next point doesn't break the convexity condition'. Can you please explain it? Why all the points are not required for that? Are the points arranged in particular order?Thug
@Naman: It looks clear enough. I could only try to explain it in different words. On each DP step there are 3 groups of points: (a) 4 points mentioned (first/last pairs), (b) points already in the hull (they are already chosen by DP algorithm, so they satisfy convexity condition and they are optimal), (c) all other points. On each step algorithm tries every point from group (c) independently and checks convexity condition for this point relative to points of group (a). If the points fits, there is no need to check its convexity relative to points of group (b), it is guaranteed to be satisfied.Sheasheaf
Somehow I am still unable to understand as why is it guaranteed. Can I please explain a case. Consider first pair (2,2), (3,1), last pair (8,2), (7,1) and points already in hull (6,6),(7,5). Now a new point (5,3) comes. It will succeed convexity condition from first and last pair but not when we consider it against all the points including in group (b). I know I am making some trivial mistake in understanding but I would really appreciate if you can correct me.Thug
@Naman: Now I understand the last part of your question. Yes, points are arranged. In your example first point in the pair (and first in sequence) is (3,1) and the last point is (7,1). So (5,3) cannot be inserted after (7,1). If you do this you'll get {(8,2), (7,1), (5,3), (3,1), (2,2)} which is not convex.Sheasheaf
So you mean before the given algorithm I need to sort/order the given points? If so order how(by x,y or clockwise?). I am sorry if I am asking silly questions. If you can help me with some link that has a detailed explanation, that would also be really helpful.Thug
@Naman: No, the points need not be presorted for this algorithm. But they are chosen by the algorithm sequentially. This determines their order. And this order may be different for different parts of DP array. I don't know where to find more detailed explanation. Probably you should just get (any) algorithms book to understand DP principle and (any) computational geometry book to read about convexity...Sheasheaf
D
1

This is my a Dynamic Programming O(N^4) algorithm with idea from Andrew's Algorithm posted by Nuclearman, i'm still looking for a O(N^3) algorithm

upper_hull(most left point, previous point, current point)
{
    ret = 0
    if (current point != most left point)   /* at anytime we can decide to end the upper hull and build lower_hull from current point ending at most left point */
        ret = min(ret,lower_hull(most left point, -1, current point)) 
    for all point on the right of current point /* here we try all possible next point that still satisfy the condition of convex polygon */
        if (cross(previous point,current point,next point) >= 0) max(ret,1+upper_hull(most left point, current point, next point))
    return ret;
}

lower_hull(most left point, previous point, current point)
{
    if (current point == most left point) return 0;
    ret = -INF /* it might be impossible to build a convex hull here, so if it does it will return -infinity */
    for all point on the left of current point and not on the left of most left point
        if (cross(previous point,current point,next point) >= 0) max(ret,1+lower_hull(most left point, current point, next point))
    return ret;
}

First sort the point based on x axis then for tie sort by y axis, then try all point as most left point to run the upper_hull(p,-1,p) , please tell me if there's any flaw in this algorithm

Dandle answered 18/2, 2014 at 17:26 Comment(1)
One possibility is to precalculate all of the cross-product results (O(N^3)) and break them into two graphs based on if the result is positive or negative, then use depth first search starting on the left most point to find the longest of the shortest paths. It seems like it should be doable in O(E), which since edge is a triple (a,b),(b,c), is O(N^3). However, you then have to do this for O(N) points (for each left-most point), results in an overall time complexity of O(N^4). Therefore, I'm fairly sure now that you can't get better than O(N^4).Ernaldus
R
-1

You can use a delaunay triangulation and remove the longest edge and also the vertex. I use a similar algorithm to find the concave hull. You can find jan example based on population data @ http://www.phpdevpad.de/geofence. I have also wrote a php class concave-hull @ phpclasses.org.

Regenerate answered 16/2, 2014 at 13:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.