I'm trying to sort d-dimensional data vectors by their Hilbert order, for bulk-loading a spatial index.
However, I do not want to compute the Hilbert value for each point explicitly, which in particular requires setting a particular precision. In high-dimensional data, this involves a precision such as 32*d
bits, which becomes quite messy to do efficiently. When the data is distributed unevenly, some of these calculations are unnecessary, and extra precision for parts of the data set are necessary.
Instead, I'm trying to do a partitioning approach. When you look at the 2D first order hilbert curve
1 4
| |
2---3
I'd split the data along the x-axis first, so that the first part (not necessarily containing half of the objects!) will consist of 1 and 2 (not yet sorted) and the second part will have objects from 3 and 4 only. Next, I'd split each half again, on the Y axis, but reverse the order in 3-4.
So essentially, I want to perform a divide-and-conquer strategy (closely related to QuickSort - on evenly distributed data this should even be optimal!), and only compute the necessary "bits" of the hilbert index as needed. So assuming there is a single object in "1", then there is no need to compute the full representation of it; and if the objects are evenly distributed, partition sizes will drop quickly.
I do know the usual textbook approach of converting to long, gray-coding, dimension interleaving. This is not what I'm looking for (there are plenty of examples of this available). I explicitly want a lazy divide-and-conquer sorting only. Plus, I need more than 2D.
Does anyone know of an article or hilbert-sorting algorithm that works this way? Or a key idea how to get the "rotations" right, which representation to choose for this? In particular in higher dimensionalities... in 2D it is trivial; 1 is rotated +y, +x, while 4 is -y,-x (rotated and flipped). But in higher dimensionalities this gets more tricky, I guess.
(The result should of course be the same as when sorting the objects by their hilbert order with a sufficiently large precision right away; I'm just trying to save the time computing the full representation when not needed, and having to manage it. Many people keep a hashmap "object to hilbert number" that is rather expensive.)
Similar approaches should be possible for Peano curves and Z-curve, and probably a bit easier to implement... I should probably try these first (Z-curve is already working - it indeed boils down to something closely resembling a QuickSort, using the appropriate mean/grid value as virtual pivot and cycling through dimensions for each iteration).
Edit: see below for how I solved it for Z and peano curves. It is also working for 2D Hilbert curves already. But I do not have the rotations and inversion right yet for Hilbert curves.