Space partitioning algorithm
Asked Answered
L

8

13

I have a set of points which are contained within the rectangle. I'd like to split the rectangles into subrectangles based on point density (giving a number of subrectangles or desired density, whichever is easiest).

The partitioning doesn't have to be exact (almost any approximation better than regular grid would do), but the algorithm has to cope with the large number of points - approx. 200 millions. The desired number of subrectangles however is substantially lower (around 1000).

Does anyone know any algorithm which may help me with this particular task?

Logroll answered 2/6, 2010 at 16:21 Comment(7)
Do you want (approximately) the same number of points in each subrectangle ? I think you do, but it's not entirely clear.Architrave
Yes, I want the same number of points in each subrect.Logroll
If you do not impose restrictions on the form of the rectangles, the problem is one dimensional, and all you have to do is to cut a segment with the same number of points (x coordinate of the points). It seems to me that you are forgetting some "topology" restrictionTrustful
Yes belisarius, you are right. I want within the rectangle to be spatially close to one another, so slicing the initial rectangle into vertical or horizontal stripes is not what I'm looking for.Logroll
Do you want an uneven grid, or just a set of mutually-exclusive covering rectangles?Bauer
@Karol I understood the "slicing the initial rectangle into vertical or horizontal stripes is not what I'm looking for" part, but I find myself not able to translate the affirmative ("I want within the rectangle to be spatially close to one another") to a specification formally enough to try to design an algorithm. Perhaps you should try to formalize the requirement or include a sample graphic of the desired outputTrustful
Well, I don't need any specific heuristic - but the rectangles are part of the external spatial indexing engine which I need to populate with points. Therefore thin slices won't be very efficient during window querying (where a window has usually 4:3 ratio).Logroll
T
2

Just to understand the problem. The following is crude and perform badly, but I want to know if the result is what you want>

Assumption> Number of rectangles is even
Assumption> Point distribution is markedly 2D (no big accumulation in one line)

Procedure>
Bisect n/2 times in either axis, looping from one end to the other of each previously determined rectangle counting "passed" points and storing the number of passed points at each iteration. Once counted, bisect the rectangle selecting by the points counted in each loop.

Is that what you want to achieve?

Trustful answered 2/6, 2010 at 16:41 Comment(1)
Yes, exactly. That would be perfect. I was thinking about initially create a matrix (1000x1000 or so) and store number of points in each cell. Then I can use your bisecting algorithm.Logroll
S
2

R-tree

Smote answered 2/6, 2010 at 16:32 Comment(2)
-1: R-tree is not an algorithm, it's a data structure, and you don't propose an algorithm for populating the R-tree.Architrave
@HPM, perhaps if you look at the page I identified, and search for the heading Algorithm, you would be satisfied.Smote
T
2

Just to understand the problem. The following is crude and perform badly, but I want to know if the result is what you want>

Assumption> Number of rectangles is even
Assumption> Point distribution is markedly 2D (no big accumulation in one line)

Procedure>
Bisect n/2 times in either axis, looping from one end to the other of each previously determined rectangle counting "passed" points and storing the number of passed points at each iteration. Once counted, bisect the rectangle selecting by the points counted in each loop.

Is that what you want to achieve?

Trustful answered 2/6, 2010 at 16:41 Comment(1)
Yes, exactly. That would be perfect. I was thinking about initially create a matrix (1000x1000 or so) and store number of points in each cell. Then I can use your bisecting algorithm.Logroll
A
2

I think I'd start with the following, which is close to what @belisarius already proposed. If you have any additional requirements, such as preferring 'nearly square' rectangles to 'long and thin' ones you'll need to modify this naive approach. I'll assume, for the sake of simplicity, that the points are approximately randomly distributed.

  1. Split your initial rectangle in 2 with a line parallel to the short side of the rectangle and running exactly through the mid-point.
  2. Count the number of points in both half-rectangles. If they are equal (enough) then go to step 4. Otherwise, go to step 3.
  3. Based on the distribution of points between the half-rectangles, move the line to even things up again. So if, perchance, the first cut split the points 1/3, 2/3, move the line half-way into the heavy half of the rectangle. Go to step 2. (Be careful not to get trapped here, moving the line in ever decreasing steps first in one direction, then the other.)
  4. Now, pass each of the half-rectangles in to a recursive call to this function, at step 1.

I hope that outlines the proposal well enough. It has limitations: it will produce a number of rectangles equal to some power of 2, so adjust it if that's not good enough. I've phrased it recursively, but it's ideal for parallelisation. Each split creates two tasks, each of which splits a rectangle and creates two more tasks.

If you don't like that approach, perhaps you could start with a regular grid with some multiple (10 - 100 perhaps) of the number of rectangles you want. Count the number of points in each of these tiny rectangles. Then start gluing the tiny rectangles together until the less-tiny rectangle contains (approximately) the right number of points. Or, if it satisfies your requirements well enough, you could use this as a discretisation method and integrate it with my first approach, but only place the cutting lines along the boundaries of the tiny rectangles. This would probably be much quicker as you'd only have to count the points in each tiny rectangle once.

I haven't really thought about the running time of either of these; I have a preference for the former approach 'cos I do a fair amount of parallel programming and have oodles of processors.

Architrave answered 2/6, 2010 at 17:7 Comment(1)
Thinking again and summing up both answers, cutting thin slice rectangles is equivalent to one dimensional problem. So, if you have N points and r rectangles, just take in account the x coordinate of the points and sum up N/r points in each partitionTrustful
M
2

You're after a standard Kd-tree or binary space partitioning tree, I think. (You can look it up on Wikipedia.)

Since you have very many points, you may wish to only approximately partition the first few levels. In this case, you should take a random sample of your 200M points--maybe 200k of them--and split the full data set at the midpoint of the subsample (along whichever axis is longer). If you actually choose the points at random, the probability that you'll miss a huge cluster of points that need to be subdivided will be approximately zero.

Now you have two problems of about 100M points each. Divide each along the longer axis. Repeat until you stop taking subsamples and split along the whole data set. After ten breadth-first iterations you'll be done.

If you have a different problem--you must provide tick marks along the X and Y axis and fill in a grid along those as best you can, rather than having the irregular decomposition of a Kd-tree--take your subsample of points and find the 0/32, 1/32, ..., 32/32 percentiles along each axis. Draw your grid lines there, then fill the resulting 1024-element grid with your points.

Medical answered 2/6, 2010 at 18:40 Comment(3)
Maybe I'm missing the point of kd-trees, but they don't seem to create rectangles with points INSIDE the rectangles as the question states. The second comment says he's looking for equal number of points within a rectangle. The reference contained many other references that I thought might be useful.Saar
@Saar - It's a small implementation detail to switch to making the lines between points instead of on points.Medical
KD-Tree was indeed my first reflex too, I balked at first because of the huge set but the sampling approach alleviates the issue.Finzer
S
0

Good question.

I think the area you need to investigate is "computational geometry" and the "k-partitioning" problem. There's a link that might help get you started here

You might find that the problem itself is NP-hard which means a good approximation algorithm is the best you're going to get.

Saar answered 2/6, 2010 at 18:36 Comment(1)
The poster has asked for equipartitioning of points, not minimizing or maximizing interactions. The latter can be NP-hard (e.g. graph max cut); the former is O(n log n).Medical
T
0

Would K-means clustering or a Voronoi diagram be a good fit for the problem you are trying to solve?

Tacita answered 2/6, 2010 at 18:40 Comment(0)
H
0

That's looks like Cluster analysis.

Haitian answered 2/6, 2010 at 18:44 Comment(1)
In general you are right - it is a cluster analysis problem. But it is quite unusual problem so I don't think that ready-to-use methods from the cluster analysis toolbox will useful. I'm only interested in rectangles - not any clusters, I have lots of object and I don't care that much about precision.Logroll
N
0

Would a QuadTree work?

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This data structure was named a quadtree by Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree. All forms of Quadtrees share some common features:

  • They decompose space into adaptable cells
  • Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits
  • The tree directory follows the spatial decomposition of the Quadtree
Nuri answered 2/6, 2010 at 19:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.