Fast spatial data structure for nearest neighbor search amongst non-uniformly sized hyperspheres

Asked 4/6, 2012 at 9:7 Answered 11/6, 2012 at 20:33

Solved c data-structures spatial multidimensional-array

Given a k-dimensional continuous (euclidean) space filled with rather unpredictably moving/growing/shrinking hyperspheres I need to repeatedly find the hypersphere whose surface is nearest to a given coordinate. If some hyperspheres are of the same distance to my coordinate, then the biggest hypersphere wins. (The total count of hyperspheres is guaranteed to stay the same over time.)

My first thought was to use a KDTree but it won't take the hyperspheres' non-uniform volumes into account. So I looked further and found BVH (Bounding Volume Hierarchies) and BIH (Bounding Interval Hierarchies), which seem to do the trick. At least in 2-/3-dimensional space. However while finding quite a bit of info and visualizations on BVHs I could barely find anything on BIHs.

My basic requirement is a k-dimensional spatial data structure that takes volume into account and is either super fast to build (off-line) or dynamic with barely any unbalancing.

Given my requirements above, which data structure would you go with? Any other ones I didn't even mention?

Edit 1: Forgot to mention: hypershperes are allowed (actually highly expected) to overlap!

Edit 2: Looks like instead of "distance" (and "negative distance" in particular) my described metric matches the power of a point much better.

Soulless answered 4/6, 2012 at 9:7 Comment(9)

Since computing the distance of a point to a hypersphere is plain trivial (even if O(k), but that way's everything in k-space), how many hyperspheres do you have in general? Of course the datastructure should pay, compared to a simple linear list of hyperspheres. Nice question, though. – Tiliaceous 4/6, 2012 at 10:44

The number of hyperspheres could be 8 (in which case I'd probably just go with brute-force, or in the thousands, or even hundred thousands, depending on the data set's size, which at this point I cannot foresee. I'm currently doing brute-force and it is painstackingly slow. – Soulless 4/6, 2012 at 13:8

Is the number of hyperspheres changing as your program executes ? And, for clarification, when you write 'a given coordinate' I expect that you mean that you want a function which, for any given coordinate, returns the nearest hypersphere. That is, the given coordinate is not fixed for the duration of the program ? – Nary 7/6, 2012 at 12:52

The number of 'haystack' hyperspheres is guaranteed to stay the same. The 'needle' coordinate is (unpredictably) different on every search. Every search results in (unpredictable) position and size changes of some of the hyperspheres. Yeah, it's that bad. ;) – Soulless 7/6, 2012 at 12:57

@HighPerformanceMark: However I should note that I do know the bounding hyperrect in which movement/scaling occured after each search, which tend to be quite big (as in almost all hyperspheres affected) at the beginning of the program run and tend to reduce down to just a single hypersphere being affected as the program progresses. – Soulless 7/6, 2012 at 13:2

How are the hyperspheres distributed in the volume? Are there "hotspots" where many or even most of your spheres (i.e. tens of thousands) occupy nearly the same volume and therefore would compete for the same node or slot in a spatial data structure? – Rosaceous 12/6, 2012 at 8:18

@slartibartfast: Yes. It starty with a high distribution entropy but the local density is expected to increase during each processing evolution. – Soulless 12/6, 2012 at 10:43

So the question which one of many spheres containing the 'needle' is the largest one does become significant in the course? Is there any hierarchical information exploitable about spheres containing spheres etc.? – Rosaceous 12/6, 2012 at 12:48

@slartibartfast: I wish there was. All I know is that each sample iteration causes a couple of spheres to get pushed towards a certain sphere (the one closest to point x) and the latter to increase its diameter. Which leads to some kind of non-hierarchical clustering. – Soulless 12/6, 2012 at 14:26

I'd expect a QuadTree/Octree/generalized to 2^K-tree for your dimensionality of K would do the trick; these recursively partition space, and presumably you can stop when a K-subcube (or K-rectangular brick if the splits aren't even) does not contain a hypersphere, or contains one or more hyperspheres such that partitioning doesn't separate any, or alternatively contains the center of just a single hypersphere (probably easier).

Inserting and deleting entities in such trees is fast, so a hypersphere changing size just causes a delete/insert pair of operations. (I suspect you can optimize this if your sphere size changes by local additional recursive partition if the sphere gets smaller, or local K-block merging if it grows).

I haven't worked with them, but you might also consider binary space partitions. These let you use binary trees instead of k-trees to partition your space. I understand that KDTrees are a special case of this.

But in any case I thought the insertion/deletion algorithms for 2^K trees and/or BSP/KDTrees was well understood and fast. So hypersphere size changes cause deletion/insertion operations but those are fast. So I don't understand your objection to KD-trees.

I think the performance of all these are asymptotically the same.

Unexacting answered 7/6, 2012 at 12:38 Comment(3)

How would space partition and/or 2^k-trees work with the fact that hyperspheres are expected to overlap (and this frequently)? Sorry if this wasn't clear. Edited my question to better reflect that. – Soulless 7/6, 2012 at 13:4

Why does their overlap matter? All you want to do is to partition space such that a small set of spheres is in any particular chunk, so that you can decide which one is best for the coordinate in that chunk. And if the overlap bothers you, then partition until a K-block contains only the centerpoint of one sphere. Then find the partition containing your point of interest and enumerate all the sub-blocks that contain spheres/spherecenterpoints. – Unexacting 7/6, 2012 at 13:18

would that be turbo-charged or just 200 galllons? – Mcculloch 13/6, 2012 at 12:2

I would use the R*Tree extension for SQLite. A table would normally have 1 or 2 dimensional data. SQL queries can combine multiple tables to search in higher dimensions.

The formulation with negative distance is a little weird. Distance is positive in geometry, so there may not be much helpful theory to use.

A different formulation that uses only positive distances may be helpful. Read about hyperbolic spaces. This might help to provide ideas for other ways to describe distance.

Pumice answered 11/6, 2012 at 20:33 Comment(0)

Recommended topics

Hot tags