Spatial data structure for finding all points greater than or less than a value in each cartesian dimension
Asked Answered
Z

1

11

I am currently working on an optimization problem that requires finding all points greater than (or in some cases less than) a particular point in all cardinal directions. For example, in 2D, I might need to find all points that satisfy the condition:

x > x* and y < y* 
for an arbitrary point (x*,y*)

(e.g.- if the blue point in the plot below is (x*,y*), I need all the points in the box defined by the blue dashed lines).

Note: I need this to be an N-Dimensional structure/search as my actual optimization problem has more than 2 objectives which must be solved for. A typical search space would be on the order of 1000-5000 points, and would have 2 to 5 dimensions.

Is there any particular data structure well suited to this purpose? In the past I have used kd-trees to find nearest neighbors, and all points within a radius, however in this case I need a directional search. It looks like some form of an R-Tree might do the trick, where my search rectangle would go from x*,y* to some largely positive and largely negative values respectively. Is there a better data structure specific to this kind of search?

example plot

Zima answered 11/3, 2014 at 23:14 Comment(7)
Does it need to be dynamic? That is, do you add and remove points in the process or do you just do different queries? Also, if the latter is the case, are the queries independent, so that an offline algorithm can process them all at once?Heyes
How big is your dimensional compared to your number of points? Often with even medium dimensional data, problems like this become very hard to solve faster than brute force.Nodal
@NiklasB. Being able to add/remove points quickly would be nice. At each step in solving the problem, I need to use the current population of points and their levels of dominance to create a new population (that may contain some and eventually most of the points in the original population). That being said, at early time steps when the system is evolving quickly, it may be best to just scrap the whole tree and rebuild with the new population.Zima
@RobNeuhaus. I have a set of 1000-5000 points, and need to calculate the dominance of each point against the rest of the set. I was hoping with a smart data structure I could greatly prune down my search space for each point.Zima
@Zima I suggest you to add the details of the algorithm to the answer, this will probably allow us to give you better advice, especially since 1000 points is not very much. You still haven't answered the question about whether the queries are independent and about the dimensionality of your data.Heyes
@NiklasB. Sorry about that. I added some details to the question- typically 1000-5000 points, with a 2-5 dimensions (90% of the time it will be 3D or 4D). As to independence- no, I do not remove the points while searching. I will generate a population of 1k-5k points, use the search to calculate dominance of each of those points, then create a new population of points, which will likely be put into a whole new structure/tree. (I think that answers both the questions)Zima
@NiklasB. Now that I re-read your question, I realized I misunderstood your question about independence. The queries are independent of each other (i.e.- I do not move the points around based on the past queries on that population). That being said, the queries are a blocking point- I can not continue with the rest of the optimization algorithm until all queries are complete on a population of points (I need the results of all queries to construct the next population).Zima
H
6

From what I've read from your comments, you are given a set of points P1, ..., Pn and you want to find for every point Pi = (x1, ..., xd) the number of points Pj = (y1, ..., yd) with xk Rk yk for all k, where Rk is one of (<, >).

One observation is that you can sort the points by their first coordinate and add them to your data structure in the order in which they become visible with regard to that coordinate. This removes one dimension from your search problem, so in the d-dimensional case, your problem now reduces to a (d-1)-dimensional orthogonal range query problem. This optimization is totally independent of how you actually solve the orthogonal range queries and by itself might make a brute-force implementation more viable.

Since your point set is more or less static (you can build a data structure on the complete set of points, initially set counters of all nodes to zero and then "insert" points by enabling them), you can use a range tree to solve the orthogonal range queries. In the trivial implementation, this gives you O(n logd-2 n) preprocessing and O(logd-2 n) query time, with a slightly enhanced version of the data structure

So in total the cost for one phase would then be O(n logd-2 n). Since your n is so small, I'm not sure whether this would actually buy you a lot (it's probably not sensible for the 5-d case, maybe not even for 4-d).

In case you want to go that route, I can highly recommend Erik Demaine's data structures lectures, which cover orthogonal range queries (in L03 and L04). He also covers the conceptionally slightly simpler half-open queries that you need, so maybe that can be implemented with a lower constant factor.

The original range tree data structure is unfortunately completely static, but there exist dynamized variants, so maybe you can even reuse trees between phases with many shared points and actually make use of variants with better query time. You can search for "dynamic orthogonal range query" for more information, there are a couple of papers about the subject. I doubt that this makes sense for your problem scale, though.

If your points are sufficiently randomly distributed, you can probably get away with partitioning the points into O(n0.5) rectangular buckets. You can count the number of points in a bucket in O(1) and the number of points within a bucket in O(n0.5) and the implementation has a very low overhead compared to range trees. Combined with the sorting by one coordinate this should give you decent speedups. This is definitely what I would try first.

Heyes answered 11/3, 2014 at 23:58 Comment(3)
Thanks Niklas. I am going to do some reading up range trees, prototype something up and see how it goes. Lots of great info here!Zima
@Zima I would recommend against building it yourself. There should be implementations out there with fractional cascading. Also keep the bucket approach in mind, it could be much simpler and very effective :)Heyes
Oh, most certainly. By prototyping I meant in replacing my current structure/search with a pre-implemented range tree. I will also look into the bucket approach. Most of the time I find simple is best! ;) (though reading through about range trees, I may want to implement one on my own time just for the sake of doing it/understanding them better)Zima

© 2022 - 2024 — McMap. All rights reserved.