Balancing KD-Tree: Which approach is more efficient?

Asked 10/6, 2013 at 10:7 Answered 24/10, 2013 at 20:56

Solved performance algorithm sorting tree kdtree

I'm trying to balance a set of (Million +) 3D points using a KD-tree and I have two ways of doing it.

Way 1:

Use an O(n) algorithm to find the arraysize/2-th largest element along a given axis and store it at the current node
Iterate over all the elements in the vector and for each, compare them to the element I just found and put those smaller in newArray1, and those larger in newArray2
Recurse

Way 2:

Use quicksort O(nlogn) to sort all the elements in the array along a given axis, take the element at position arraysize/2 and store it in the current node.
Then put all the elements from index 0 to arraysize/2-1 in newArray1, and those from arraysize/2 to arraysize-1 in newArray2
Recurse

Way 2 seems more "elegant" but way 1 seems faster since the median search and the iterating are both O(n) so I get O(2n) which just reduces to O(n). But then at the same time, even though way 2 is O(nlogn) time to sort, splitting up the array into 2 can be done in constant time, but does it make up for the O(nlogn) time for sorting?

What should I do? Or is there an even better way to do this that I'm not even seeing?

Laster answered 10/6, 2013 at 10:7 Comment(3)

The arraysize/2-th largest element is called the median (in case you didn't know) – Buttercup 24/10, 2013 at 20:53

sorting array elements based on which axis ? every time we need to sort when we insert node in tree ? is these tree is balanced ? – Chape 27/8, 2015 at 9:8

If i have (12,21),(13,27),(19,5),(39,5),(49,63),(43,45),(41,22),(27,7),(20,12),(32,11),(24.56) these points then how can i make tree from your algoritham steps? – Chape 27/8, 2015 at 9:12

How about Way 3:

Use an O(n) algorithm such as QuickSelect to ensure that the element at position length/2 is the correct element, all elements before are less, and all afterwards are larger than it (without sorting them completely!) - this is probably the algorithm you used in your Way 1 step 1 anyway...
Recurse into each half (except middle element) and repeat with next axis.

Note that you actually do not need to make "node" objects. You can actually keep the tree in a large array. When searching, start at length/2 with the first axis.

I've seen this trick being used by ELKI. It uses very little memory and code, which makes the tree quite fast.

Terylene answered 10/6, 2013 at 13:40 Comment(6)

Can you elaborate what you mean? So are you saying that I should have a loop and repeatedly call QuickSelect on 0, 1,2, 3, etc? Wouldn't that take longer than a forloop? – Laster 10/6, 2013 at 19:17

No. If you run QuickSelect, it will actually pivotize your array, such that the median is in the middle, and the other elements are as desired before and after. – Terylene 10/6, 2013 at 21:3

But quickselect keeps breaking the array into smaller and smaller arrays and then just returns the kth-largest element. I don't see how it will give me just the two arrays that I want when it winds up creating dozens of them. – Laster 10/6, 2013 at 21:27

A clean QuickSelect should partially sort the array, not copy/break it. You probably did a naive implementation of QuickSelect yourself? – Terylene 10/6, 2013 at 21:32

yeah I implemented my own version where I break the array into two sub arrays and recursively call quickselect on each subarray - but I see what you are saying now that I shouldn't partition the array, just use the same array throughout – Laster 10/6, 2013 at 21:36

It's exactly like Quicksort, except that you just iterate into one half instead of recursing into both, btw. – Terylene 10/6, 2013 at 22:14

Another way:

Sort for each of the dimensions: O(K N log N). This will be performed only once, we will utilize the sorted list on the dimensions.

For the current dimension, find the median in O(1) time, split for the median in O(N) time, split also the sorted arrays for each of the dimensions in O(KN) time, and recurse for the next dimension.

In that way, you will perform sorts at the beginning. And perform (K+1) splits/filterings for each subtree, for a known value. For small K, this approach should be faster than the other approaches.

Note: The additional space needed for the algorithm can be decreased by the tricks pointed out by Anony-Mousse.

Regeniaregensburg answered 23/6, 2013 at 20:29 Comment(0)

Notice that if the query hyper-rectangle contains many points (all of them for example) it does not matter if the tree is balanced or not. A balanced tree is useful if the query hyper-rects are small.

Buttercup answered 24/10, 2013 at 20:56 Comment(0)

Recommended topics

Hot tags