Why do we need to triangulate a convex polygon in order to sample uniformly from it?
Asked Answered
R

2

4

Suppose I want to uniformly sample points inside a convex polygon.

One of the most common approaches described here and on the internet in general consists in triangulation of the polygon and generate uniformly random points inside each triangles using different schemes.

The one I find most practical is to generate exponential distributions from uniform ones taking -log(U) for instance and normalizing the sum to one.

Within Matlab, we would have this code to sample uniformly inside a triangle:

vertex=[0 0;1 0;0.5 0.5]; %vertex coordinates in the 2D plane

mix_coeff=rand(10000,size(vertex,1)); %uniform generation of random coefficients
x=-log(x); %make the uniform distribution exponential
x=bsxfun(@rdivide,x,sum(x,2)); %normalize such that sum is equal to one
unif_samples=x*vertex; %calculate the 2D coordinates of each sample inside the triangle

And this works just fine:

enter image description here

However, using the exact same scheme for anything other than a triangle just fails. For instance for a quadrilateral, we get the following result:

enter image description here

Clearly, sampling is not uniform anymore and the more vertices you add, the more difficult it is to "reach" the corners.

If I triangulate the polygon first then uniform sampling in each triangle is easy and obviously gets the job done.

But why? Why is it necessary to triangulate first?

Which specific property have triangle (and simplexes in general since this behaviour seems to extend to n-dimensional constructions) that makes it work for them and not for the other polygons?

I would be grateful if someone could give me an intuitive explanation of the phenomena or just point to some reference that could help me understand what is going on.

Robotize answered 7/9, 2020 at 11:12 Comment(0)
I
3

I should point out that it's not strictly necessary to triangulate a polygon in order to sample uniformly from it. Another way to sample a shape is rejection sampling and proceeds as follows.

  1. Determine a bounding box that covers the entire shape. For a polygon, this is as simple as finding the highest and lowest x and y coordinates of the polygon.
  2. Choose a point uniformly at random in the bounding box.
  3. If the point lies inside the shape, return that point. (For a polygon, algorithms that determine this are collectively called point-in-polygon predicates.) Otherwise, go to step 2.

However, there are two things that affect the running time of this algorithm:

  1. The time complexity depends greatly on the shape in question. In general, the acceptance rate of this algorithm is the volume of the shape divided by the volume of the bounding box. (In particular, the acceptance rate is typically very low for high-dimensional shapes, in part because of the curse of dimensionality: typical shapes cover a much smaller volume than their bounding boxes.)
  2. Also, the algorithm's efficiency depends on how fast it is to determine whether a point lies in the shape in question. Because of this, it's often the case that complex shapes are made up of simpler shapes, such as triangles, circles, and rectangles, for which it's easy to determine whether a point lies in the complex shape or to determine that shape's bounding box.

Note that rejection sampling can be applied, in principle, to sample any shape of any dimension, not just convex 2-dimensional polygons. It thus works for circles, ellipses, and curved shapes, among others.

And indeed, a polygon could, in principle, be decomposed into a myriad of shapes other than triangles, one of those shapes sampled in proportion to its area, and a point in that shape sampled at random via rejection sampling.


Now, to explain a little about the phenomenon you give in your second image:

What you have there is not a 4-sided (2-dimensional) polygon, but rather a 3-dimensional simplex (namely a tetrahedron) that was projected to 2-dimensional space. (See also the previous answer.) This projection explains why points inside the "polygon" appear denser in the interior than in the corners. You can see why if you picture the "polygon" as a tetrahedron with its four corners at different depths. With higher dimensions of simplex, this phenomenon becomes more and more acute, again due partly to the curse of dimensionality.

Ives answered 9/9, 2020 at 20:49 Comment(4)
I think I get what's happening now and why it's not working the way I initially thought about. The sum of coefficients to one indeed makes my quadrilateral actually being a projection of a tetrahedron on the 2D space and it makes sense that the more vertex I have the worse the situation becomes. I must work a little bit more on the details but you gave me the insight that I was lacking to unblock me.Robotize
@Robotize wrt #2, it is rather an understatement that getting answer about point in/point out might be expensive. For complex non-convex polygon with potential holes algortihm would run in O(N), where N is number of edges. And often, indeed, people triangulate polygon anyway to get known code for point in/out or for ray intersection to work out. I mean, you have to choose where your complexity is, but you cannot avoid itLent
@Robotize I've updated my answer, please take a lookLent
@SeverinPappadeux: With much of my answer I wanted to make the point that sampling a convex polygon does not necessarily mean having to decompose that polygon into triangles in particular. The time complexity of sampling a polygon or any other shape is not the main point of my answer.Ives
L
3

Well, there are less expensive methods to sample uniform in the triangle. You're sampling Dirichlet distribution in the simplex d+1 and taking projection, computing exponents and such. I would refer you to the code sample and paper reference here, only square roots, a lot simpler algorithm.

Concerning uniform sampling in complex areas (quadrilateral in your case) general approach is quite simple:

  • Triangulate. You'll get two triangles with vertices (a,b,c)0 and (a,b,c)1
  • Compute triangle areas A0 and A1 using, f.e. Heron's formula
  • First step, randomly select one of the triangles based on area. if (random() < A0/(A0+A1)) select triangle 0 else select triangle 1. random() shall return float in the range [0...1]
  • Sample point in selected triangle using method mentioned above.

This approach could be easily extended to sample for any complex area with uniform density: N triangles, Categorical distribution sampling with probabilities proportional to areas will get you selected triangle, then sample point in the triangle.

UPDATE

We have to triangulate because we know good (fast, reliable, only 2 RNG calls, ...) algorithm to sample with uniform density in triangle. Then we could build on it, good software is all about reusability, and pick one triangle (at the cost of another rng call) and then back to sample from it, total three RNG calls to get uniform density sampling from ANY area, convex and concave alike. Pretty universal method, I would say. And triangulation is a solved problem, and basically you do it once (triangulate and build weights array Ai/Atotal) and sample till infinity.

Another part of the answer is that we (me, to be precise, but I've worked with random sampling ~20years) don't know good algorithm to sample precisely with uniform density from arbitrary convex more-than-three-vertices closed polygon. You proposed some algorithm based on hunch and it didn't work out. And it shouldn't work, because what you use is Dirichlet distribution in d+1 simplex and project it back to d hyperplane. It is not extendable even to quadrilateral, not talking to some arbitrary convex polygon. And I would state conjecture, that even such algorithm exist, n-vertices polygon would require n-1 calls to RNG, which means there is no triangulation setup, but each call to get a point would be rather expensive.

Few words on complexity of the sampling. Assuming you did triangulation, then with 3 calls to RNG you'll get one point sampled uniformly inside your polygon. But complexity of sampling wrt number of triangles N would be O(log(N)) at best. YOu basically would do binary search over partial sums of Ai/Atotal.

You could do a bit better, there is O(1) (constant time) sampling using Alias sampling of the triangle. The cost would be a bit more setup time, but it could be fused with triangulation. Also, it would require one more RNG calls. So for four RNG calls you would have constant point sampling time independent of complexity of your polygon, works for any shape

Lent answered 9/9, 2020 at 2:14 Comment(4)
Thanks for the references. However you do not answer the original question : why do we have to triangulate? Why cannot we directly sample a convex polygon without splitting it in several triangles at first ?Robotize
I'm not saying that triangulation is not the proper way to go. It's actually the opposite since pretty much all algorithm described on the web use triangulation and sampling over the identified simplexes. Thanks for confirming that's indeed the best practice to solve this kind of problem.Robotize
I am not statistician so I may ask something stupid but could you explain why it is ok to sample from Dirichlet distribution and project in the case of a triangle and not for a quadrilateral ?Robotize
@Robotize Dirichlet distribution in d+1 (with α=1) is uniform in simplex bound by condition X1+X2+...+Xd=1. Simplex is a triangle. In 3d simplex is triangle plane piece with corners at (1,0,0), (0,1,0),(0,0,1). When you project it to d dimensional plane, say, XY plane, you'll get uniform density sampling in triangle (0,0),(1,0),(0,1). This is what you're doing, then with known vertices you could fill any triangle on the plane. We DO NOT have similar to Dirichlet distribution for quadrilateral. I would say it is not possible to have such distribution so it would cover any quadrilateral.Lent
I
3

I should point out that it's not strictly necessary to triangulate a polygon in order to sample uniformly from it. Another way to sample a shape is rejection sampling and proceeds as follows.

  1. Determine a bounding box that covers the entire shape. For a polygon, this is as simple as finding the highest and lowest x and y coordinates of the polygon.
  2. Choose a point uniformly at random in the bounding box.
  3. If the point lies inside the shape, return that point. (For a polygon, algorithms that determine this are collectively called point-in-polygon predicates.) Otherwise, go to step 2.

However, there are two things that affect the running time of this algorithm:

  1. The time complexity depends greatly on the shape in question. In general, the acceptance rate of this algorithm is the volume of the shape divided by the volume of the bounding box. (In particular, the acceptance rate is typically very low for high-dimensional shapes, in part because of the curse of dimensionality: typical shapes cover a much smaller volume than their bounding boxes.)
  2. Also, the algorithm's efficiency depends on how fast it is to determine whether a point lies in the shape in question. Because of this, it's often the case that complex shapes are made up of simpler shapes, such as triangles, circles, and rectangles, for which it's easy to determine whether a point lies in the complex shape or to determine that shape's bounding box.

Note that rejection sampling can be applied, in principle, to sample any shape of any dimension, not just convex 2-dimensional polygons. It thus works for circles, ellipses, and curved shapes, among others.

And indeed, a polygon could, in principle, be decomposed into a myriad of shapes other than triangles, one of those shapes sampled in proportion to its area, and a point in that shape sampled at random via rejection sampling.


Now, to explain a little about the phenomenon you give in your second image:

What you have there is not a 4-sided (2-dimensional) polygon, but rather a 3-dimensional simplex (namely a tetrahedron) that was projected to 2-dimensional space. (See also the previous answer.) This projection explains why points inside the "polygon" appear denser in the interior than in the corners. You can see why if you picture the "polygon" as a tetrahedron with its four corners at different depths. With higher dimensions of simplex, this phenomenon becomes more and more acute, again due partly to the curse of dimensionality.

Ives answered 9/9, 2020 at 20:49 Comment(4)
I think I get what's happening now and why it's not working the way I initially thought about. The sum of coefficients to one indeed makes my quadrilateral actually being a projection of a tetrahedron on the 2D space and it makes sense that the more vertex I have the worse the situation becomes. I must work a little bit more on the details but you gave me the insight that I was lacking to unblock me.Robotize
@Robotize wrt #2, it is rather an understatement that getting answer about point in/point out might be expensive. For complex non-convex polygon with potential holes algortihm would run in O(N), where N is number of edges. And often, indeed, people triangulate polygon anyway to get known code for point in/out or for ray intersection to work out. I mean, you have to choose where your complexity is, but you cannot avoid itLent
@Robotize I've updated my answer, please take a lookLent
@SeverinPappadeux: With much of my answer I wanted to make the point that sampling a convex polygon does not necessarily mean having to decompose that polygon into triangles in particular. The time complexity of sampling a polygon or any other shape is not the main point of my answer.Ives

© 2022 - 2024 — McMap. All rights reserved.