Mapping N-dimensional value to a point on Hilbert curve
Asked Answered
V

7

59

I have a huge set of N-dimensional points (tens of millions; N is close to 100).

I need to map these points to a single dimension while preserving spatial locality. I want to use Hilbert space-filling curve to do it.

For each point I want to pick the closest point on the curve. The Hilbert value of the point (curve length from the start of curve to the picked point) is the single dimension value I seek.

Computation does not have to be instant, but I expect it to be no more than several hours on decent modern home PC hardware.

Any suggestions on implementation? Are there any libraries that would help me? (Language does not matter much.)

Volumetric answered 31/1, 2009 at 16:59 Comment(6)
I've used a hilbert curve for multi dimensional mapping of OLAP data, I found that it in terms of performance it wasn't any better than simpler algorithms. But I was testing with smaller sets of dimensions than you. I'm not sure what your actual question is though.Hamill
Well, my question is: "How I do it?"Volumetric
I have spent weeks looking for an answer to the same question. Papers on the subject are either unintelligible, or readable but without any code given. When I do find code, I can't follow it, or it says the approach won't scale beyond ten dimensions. Nevertheless, the experts insist that this approach that you are pursuing is sound, so don't give up!Sortilege
@PaulChernoch The post below talks about locality problems. Maybe you have a reference from experts that would disprove that?Volumetric
The paper "An inventory of three-dimensional Hilbert space-filling curves" by Herman Haverkort, published last year and available for free on the internet, has useful things to say about locality measures. He did exhaustive investigation of tens of thousands of 3D Hilbert curves (there is more than one) and found that some are much better than others. My favorite is one he calls Neptunus. The paper is 25 pages long and has pretty pictures.Sortilege
Just curious why do you need it, what is real-world application?Scab
S
67

I finally broke down and shelled out some money. The AIP (American Institute of Physics) has a nice, short article with source code in C. "Programming the Hilbert curve" by John Skilling (from the AIP Conf. Proc. 707, 381 (2004)) has an appendix with code for mappings in both directions. It works for any number of dimensions > 1, is not recursive, does not use state-transition lookup tables that gobble up huge amounts of memory, and mostly uses bit operations. Thus it is reasonably fast and has a good memory footprint.

If you choose to purchase the article, I discovered an error in the source code.

The following line of code (found in the function TransposetoAxes) has the error:

for( i = n-1; i >= 0; i-- ) X[i] ^= X[i-1];

The correction is to change the greater than or equal (>=) to a greater than (>). Without this correction, the X array is accessed using a negative index when the variable “i” becomes zero, causing the program to fail.

I recommend reading the article (which is seven pages long, including the code), as it explains how the algorithm works, which is far from obvious.

I translated his code into C# for my own use. The code follows. Skilling performs the transformation in place, overwriting the vector that you pass in. I chose to make a clone of the input vector and return a new copy. Also, I implemented the methods as extension methods.

Skilling's code represents the Hilbert index as a transpose, stored as an array. I find it more convenient to interleave the bits and form a single BigInteger (more useful in Dictionaries, easier to iterate over in loops, etc), but I optimized that operation and its inverse with magic numbers, bit operations and the like, and the code is lengthy, so I have omitted it.

namespace HilbertExtensions
{
    /// <summary>
    /// Convert between Hilbert index and N-dimensional points.
    /// 
    /// The Hilbert index is expressed as an array of transposed bits. 
    /// 
    /// Example: 5 bits for each of n=3 coordinates.
    /// 15-bit Hilbert integer = A B C D E F G H I J K L M N O is stored
    /// as its Transpose                        ^
    /// X[0] = A D G J M                    X[2]|  7
    /// X[1] = B E H K N        <------->       | /X[1]
    /// X[2] = C F I L O                   axes |/
    ///        high low                         0------> X[0]
    ///        
    /// NOTE: This algorithm is derived from work done by John Skilling and published in "Programming the Hilbert curve".
    /// (c) 2004 American Institute of Physics.
    /// 
    /// </summary>
    public static class HilbertCurveTransform
    {
        /// <summary>
        /// Convert the Hilbert index into an N-dimensional point expressed as a vector of uints.
        ///
        /// Note: In Skilling's paper, this function is named TransposetoAxes.
        /// </summary>
        /// <param name="transposedIndex">The Hilbert index stored in transposed form.</param>
        /// <param name="bits">Number of bits per coordinate.</param>
        /// <returns>Coordinate vector.</returns>
        public static uint[] HilbertAxes(this uint[] transposedIndex, int bits)
        {
            var X = (uint[])transposedIndex.Clone();
            int n = X.Length; // n: Number of dimensions
            uint N = 2U << (bits - 1), P, Q, t;
            int i;
            // Gray decode by H ^ (H/2)
            t = X[n - 1] >> 1;
            // Corrected error in Skilling's paper on the following line. The appendix had i >= 0 leading to negative array index.
            for (i = n - 1; i > 0; i--) 
                X[i] ^= X[i - 1];
            X[0] ^= t;
            // Undo excess work
            for (Q = 2; Q != N; Q <<= 1)
            {
                P = Q - 1;
                for (i = n - 1; i >= 0; i--)
                    if ((X[i] & Q) != 0U)
                        X[0] ^= P; // invert
                    else
                    {
                        t = (X[0] ^ X[i]) & P;
                        X[0] ^= t;
                        X[i] ^= t;
                    }
            } // exchange
            return X;
        }

        /// <summary>
        /// Given the axes (coordinates) of a point in N-Dimensional space, find the distance to that point along the Hilbert curve.
        /// That distance will be transposed; broken into pieces and distributed into an array.
        /// 
        /// The number of dimensions is the length of the hilbertAxes array.
        ///
        /// Note: In Skilling's paper, this function is called AxestoTranspose.
        /// </summary>
        /// <param name="hilbertAxes">Point in N-space.</param>
        /// <param name="bits">Depth of the Hilbert curve. If bits is one, this is the top-level Hilbert curve.</param>
        /// <returns>The Hilbert distance (or index) as a transposed Hilbert index.</returns>
        public static uint[] HilbertIndexTransposed(this uint[] hilbertAxes, int bits)
        {
            var X = (uint[])hilbertAxes.Clone();
            var n = hilbertAxes.Length; // n: Number of dimensions
            uint M = 1U << (bits - 1), P, Q, t;
            int i;
            // Inverse undo
            for (Q = M; Q > 1; Q >>= 1)
            {
                P = Q - 1;
                for (i = 0; i < n; i++)
                    if ((X[i] & Q) != 0)
                        X[0] ^= P; // invert
                    else
                    {
                        t = (X[0] ^ X[i]) & P;
                        X[0] ^= t;
                        X[i] ^= t;
                    }
            } // exchange
            // Gray encode
            for (i = 1; i < n; i++)
                X[i] ^= X[i - 1];
            t = 0;
            for (Q = M; Q > 1; Q >>= 1)
                if ((X[n - 1] & Q)!=0)
                    t ^= Q - 1;
            for (i = 0; i < n; i++)
                X[i] ^= t;

            return X;
        }

    }
}

I have posted working code in C# to github.

See https://github.com/paulchernoch/HilbertTransformation

UPDATE: I just published (Fall 2019) a Rust crate on crates.io called "hilbert". It also uses Skilling's algorithm. See https://crates.io/crates/hilbert

Sortilege answered 30/4, 2012 at 12:57 Comment(13)
Thanks for posting this. Any clarification on the license of both the original and your code?Costumer
You may freely use the C# code above as far as I am concerned. All I did was translate from C to C# and fix a typo in the Journal article. Please give credit to Skilling.Sortilege
The source code is also posted here: tddft.org/svn/octopus/trunk/src/grid/hilbert.c (with John Skilling name quoted inside).Vaughnvaught
Would you mind elaborating on your paragraph right before the code block? Sounds neat.Louise
Certainly. Interleaving means take one bit from the first matrix element, one bit from the next, etc, then take the second bit from the first matrix element, second bit from the second, all the way to the last bit of the last element. Combine those bits in that order into a single BigInteger, which can have as many bits as necessary. This converts the array into a single number.Sortilege
@PaulChernoch I have tried using the C version of the same code and can't get it running. Could you please give me some advice how to run the original C routines. I posted a question about this here #35847889Appointee
this is how i interleave bit, if someone needs it public struct vector { public uint X; public uint Y; public ulong interleave() { ulong z = 0; for (int i = 0; i < sizeof(uint); i++) z |= (X & 1U << i) << i | (Y & 1U << i) << (i + 1); return z; } }Dehydrogenate
@PaulChernoch Can you formalize the C# sourcecode into a project and push it into Github (and put your attribution into the README)? It'd be awesome if anyone who needed it could simply 1) Google-search it and end-up there, and 2) use an existing project rather than everyone who happens to find this SO post having to create their own.Cougar
@DustinOprea - I am hoping to do so soon. My code has bloated and I need to prune it before publishing it. Also, some optimizations I made broke it last summer, and I just figured out the fix a week ago.Sortilege
@PaulChernoch Put it online and you can both start building commit history as well as get help from the community. The same would've also applied in the beginning. It doesn't all have to be on your shoulders. Don't be a hero :) . Besides.. just commit what's above. These are just ideas; commit whenever you want.Cougar
At your prodding, I have cleaned up the code a little and posted it on github. See github.com/paulchernoch/HilbertTransformationSortilege
@PaulChernoch That's great! Thanks.Cougar
See my answer below for cleaned up Hilbert code and inverse transform, plus several other space filling curves.Including
N
8

Algorithm for mapping from n->1 and 1->n given here "Calculation of Mappings Between One and n-dimensional Values Using the Hilbert Space-filling Curve" J K Lawder

If you Google for "SFC module and Kademlia overlay" youl find a group who claim to use it as part of their system. If you view the source you can probably extract the relevant function.

Naxos answered 14/8, 2009 at 22:40 Comment(2)
I went thorough the article you pointed, but, just looking at Table 3, I do not understand the algorithm they suggest to go from n to 1. Is it iterative? They start with w, which depends on tau_tilde which is computed at the end.Tuna
If yes, how should I initialise?Tuna
G
8

I spent a little time translating Paul Chernoch's code to Java and cleaning it up. It's possible that there is a bug in my code, especially because I don't have access to the paper that it's originally from. However, it passes what unit tests I was able to write. It is below.

Note that I have evaluated both Z-Order and Hilbert curves for spatial indexing on largish data sets. I have to say that Z-Order provides far better quality. But feel free to try for yourself.

    /**
     * Convert the Hilbert index into an N-dimensional point expressed as a vector of uints.
     *
     * Note: In Skilling's paper, this function is named TransposetoAxes.
     * @param transposedIndex The Hilbert index stored in transposed form.
     * @param bits Number of bits per coordinate.
     * @return Point in N-space.
     */
    static long[] HilbertAxes(final long[] transposedIndex, final int bits) {
        final long[] result = transposedIndex.clone();
        final int dims = result.length;
        grayDecode(result, dims);
        undoExcessWork(result, dims, bits);
        return result;
    }

    static void grayDecode(final long[] result, final int dims) {
        final long swap = result[dims - 1] >>> 1;
        // Corrected error in Skilling's paper on the following line. The appendix had i >= 0 leading to negative array index.
        for (int i = dims - 1; i > 0; i--)
            result[i] ^= result[i - 1];
        result[0] ^= swap;
    }

    static void undoExcessWork(final long[] result, final int dims, final int bits) {
        for (long bit = 2, n = 1; n != bits; bit <<= 1, ++n) {
            final long mask = bit - 1;
            for (int i = dims - 1; i >= 0; i--)
                if ((result[i] & bit) != 0)
                    result[0] ^= mask; // invert
                else
                    swapBits(result, mask, i);
        }
    }

    /**
     * Given the axes (coordinates) of a point in N-Dimensional space, find the distance to that point along the Hilbert curve.
     * That distance will be transposed; broken into pieces and distributed into an array.
     *
     * The number of dimensions is the length of the hilbertAxes array.
     *
     * Note: In Skilling's paper, this function is called AxestoTranspose.
     * @param hilbertAxes Point in N-space.
     * @param bits Depth of the Hilbert curve. If bits is one, this is the top-level Hilbert curve.
     * @return The Hilbert distance (or index) as a transposed Hilbert index.
     */
    static long[] HilbertIndexTransposed(final long[] hilbertAxes, final int bits) {
        final long[] result = hilbertAxes.clone();
        final int dims = hilbertAxes.length;
        final long maxBit = 1L << (bits - 1);
        inverseUndo(result, dims, maxBit);
        grayEncode(result, dims, maxBit);
        return result;
    }

    static void inverseUndo(final long[] result, final int dims, final long maxBit) {
        for (long bit = maxBit; bit != 0; bit >>>= 1) {
            final long mask = bit - 1;
            for (int i = 0; i < dims; i++)
                if ((result[i] & bit) != 0)
                    result[0] ^= mask; // invert
                else
                    swapBits(result, mask, i);
        } // exchange
    }

    static void grayEncode(final long[] result, final int dims, final long maxBit) {
        for (int i = 1; i < dims; i++)
            result[i] ^= result[i - 1];
        long mask = 0;
        for (long bit = maxBit; bit != 0; bit >>>= 1)
            if ((result[dims - 1] & bit) != 0)
                mask ^= bit - 1;
        for (int i = 0; i < dims; i++)
            result[i] ^= mask;
    }

    static void swapBits(final long[] array, final long mask, final int index) {
        final long swap = (array[0] ^ array[index]) & mask;
        array[0] ^= swap;
        array[index] ^= swap;
    }
Garek answered 12/7, 2016 at 22:13 Comment(2)
did you also perhaps do the conversion from long[] to BigInteger? (untranspose) I'm interested in range queries.Cavitation
What does "quality" mean here? Hilbert curves are probably better at preserving locality.Awash
D
6

It's not clear to me how this will do what you want. Consider this trival 3D case:

001 ------ 101
 |\         |\
 | \        | \
 |  011 ------ 111
 |   |      |   |
 |   |      |   |
000 -|---- 100  |
  \  |       \  |
   \ |        \ |
    010 ------ 110

which can be "Hilbertized" by the following path:

001 -----> 101
  \          \
   \          \
    011        111
     ^          |
     |          |
000  |     100  |
  \  |       \  |
   \ |        \ V
    010        110

into the 1D order:

000 -> 010 -> 011 -> 001 -> 101 -> 111 -> 110 -> 100

Here's the nasty bit. Consider the list of pairs and 1D distances below:

000 : 100 -> 7
010 : 110 -> 5
011 : 111 -> 3
001 : 101 -> 1

In all cases, the left- and right-hand values are the same 3D distance from each other (+/- 1 in the first position), which seem to imply similar "spatial locality". But linearizing by any choice of dimensional ordering (y, then z, then z, in the above example) breaks that locality.

Another way of saying this is that taking a starting point and ordering the remaining points by their distance from that starting point will provide significantly different results. Taking 000 as the start, for example:

1D ordering : distance    3D ordering : distance
----------------------    ----------------------
        010 : 1           001,010,100 : 1
                          011,101,110 : sqrt(2)
                              111     : sqrt(3)
        011 : 2
        001 : 3
        101 : 4
        111 : 5
        110 : 6
        100 : 7

This effect grows exponentially with the number of dimensions (assuming that each dimension has the same "size").

Dialecticism answered 31/1, 2009 at 19:5 Comment(4)
Well, if I get it right, locality is still preserved -- all points within the cube would belong to edges of that cube and nowhere else. The less cube volume is, the better locality is preserved. I think results should be close enough for me.Volumetric
See also article "Analysis of the Clustering Properties of the Hilbert Space-Filling Curve": cs.cmu.edu/afs/cs.cmu.edu/user/christos/www/PUBLICATIONS/…Volumetric
Yes but as your "cubes" get smaller, the length of your Hilbert curve increases exponentially. I suspect that it will be very long before you get a good ordering on your points, because of complexity of the problem you are trying to solve.Musaceous
@Alexander: The size of the cube isn't the issue. The orthogonal neighbors of a point still end up having radically different 1D distances. And my example was truly trivial. It's not the general case that all points are on the edge of the cube.Dialecticism
I
4

Here is John Skilling's original C code for encode/decode of Hilbert coordinates in arbitrary dimensions. This is from the paper cited by Paul Chernoch above, Programming the Hilbert curve" by John Skilling (from the AIP Conf. Proc. 707, 381 (2004)).

This code has the bug fix applied.

I also extended main() to show both encoding and decoding. I also added functions interleaveBits() and uninterleaveBits() that convert the Hilbert-transposed coordinates into a single code and back, which is what most people are interested in.

Skilling's code works for arbitrary dimensions, but my interleaveBits() functions are specific to three dimensions. Easy to extend, though.

//+++++++++++++++++++++++++++ PUBLIC-DOMAIN SOFTWARE ++++++++++++++++++++++++++
// Functions: TransposetoAxes AxestoTranspose
// Purpose: Transform in-place between Hilbert transpose and geometrical axes
// Example: b=5 bits for each of n=3 coordinates.
// 15-bit Hilbert integer = A B C D E F G H I J K L M N O is stored
// as its Transpose
//        X[0] = A D G J M             X[2]|
//        X[1] = B E H K N <------->       | /X[1]
//        X[2] = C F I L O            axes |/
//               high  low                 0------ X[0]
// Axes are stored conventially as b-bit integers.
// Author: John Skilling 20 Apr 2001 to 11 Oct 2003
//-----------------------------------------------------------------------------

#include <cstdio>

typedef unsigned int coord_t; // char,short,int for up to 8,16,32 bits per word

void TransposetoAxes(coord_t* X, int b, int n) // Position, #bits, dimension
{
    coord_t N = 2 << (b - 1), P, Q, t;

    // Gray decode by H ^ (H/2)
    t = X[n - 1] >> 1;
    // Corrected error in Skilling's paper on the following line. The appendix had i >= 0 leading to negative array index.
    for (int i = n - 1; i > 0; i--) X[i] ^= X[i - 1];
    X[0] ^= t;

    // Undo excess work
    for (Q = 2; Q != N; Q <<= 1) {
        P = Q - 1;
        for (int i = n - 1; i >= 0; i--)
            if (X[i] & Q) // Invert
                X[0] ^= P;
            else { // Exchange
                t = (X[0] ^ X[i]) & P;
                X[0] ^= t;
                X[i] ^= t;
            }
    }
}

void AxestoTranspose(coord_t* X, int b, int n) // Position, #bits, dimension
{
    coord_t M = 1 << (b - 1), P, Q, t;

    // Inverse undo
    for (Q = M; Q > 1; Q >>= 1) {
        P = Q - 1;
        for (int i = 0; i < n; i++)
            if (X[i] & Q) // Invert
                X[0] ^= P;
            else { // Exchange
                t = (X[0] ^ X[i]) & P;
                X[0] ^= t;
                X[i] ^= t;
            }
    }

    // Gray encode
    for (int i = 1; i < n; i++) X[i] ^= X[i - 1];
    t = 0;
    for (Q = M; Q > 1; Q >>= 1)
        if (X[n - 1] & Q) t ^= Q - 1;
    for (int i = 0; i < n; i++) X[i] ^= t;
}

int interleaveBits(coord_t* X, int b, int n) // Position, #bits, dimension
{
    unsigned int codex = 0, codey = 0, codez = 0;

    const int nbits2 = 2 * b;

    for (int i = 0, andbit = 1; i < nbits2; i += 2, andbit <<= 1) {
        codex |= (unsigned int)(X[0] & andbit) << i;
        codey |= (unsigned int)(X[1] & andbit) << i;
        codez |= (unsigned int)(X[2] & andbit) << i;
    }

    return (codex << 2) | (codey << 1) | codez;
}

// From https://github.com/Forceflow/libmorton/blob/main/include/libmorton/morton3D.h
void uninterleaveBits(coord_t* X, int b, int n, unsigned int code) // Position, #bits, dimension
{
    X[0] = X[1] = X[2] = 0;

    for (unsigned int i = 0; i <= b; ++i) {
        unsigned int selector = 1;
        unsigned int shift_selector = 3 * i;
        unsigned int shiftback = 2 * i;
        X[2] |= (code & (selector << shift_selector)) >> (shiftback);
        X[1] |= (code & (selector << (shift_selector + 1))) >> (shiftback + 1);
        X[0] |= (code & (selector << (shift_selector + 2))) >> (shiftback + 2);
    }
}

int main()
{
    coord_t X[3] = {5, 10, 20}; // Any position in 32x32x32 cube
    printf("Input coords = %d,%d,%d\n", X[0], X[1], X[2]);

    AxestoTranspose(X, 5, 3); // Hilbert transpose for 5 bits and 3 dimensions
    printf("Hilbert coords = %d,%d,%d\n", X[0], X[1], X[2]);

    unsigned int code = interleaveBits(X, 5, 3);

    printf("Hilbert integer = %d = %d%d%d %d%d%d %d%d%d %d%d%d %d%d%d = 7865 check\n", code,
           X[0] >> 4 & 1, X[1] >> 4 & 1, X[2] >> 4 & 1, 
           X[0] >> 3 & 1, X[1] >> 3 & 1, X[2] >> 3 & 1, 
           X[0] >> 2 & 1, X[1] >> 2 & 1, X[2] >> 2 & 1, 
           X[0] >> 1 & 1, X[1] >> 1 & 1, X[2] >> 1 & 1, 
           X[0] >> 0 & 1, X[1] >> 0 & 1, X[2] >> 0 & 1);

    uninterleaveBits(X, 5, 3, code);
    printf("Reconstructed Hilbert coords = %d,%d,%d\n", X[0], X[1], X[2]);

    TransposetoAxes(X, 5, 3);
    printf("Orig coords = %d,%d,%d\n", X[0], X[1], X[2]);

    return 0;
}

EDIT: I took this code and paired it with similar code for other space-filling curves (Morton, Raster, Boustrophedonic, and Tiled) and made them all available on GitHub. I include both the forward and inverse transforms for all curves, and some code that measures them against each other for various quality metrics. See https://github.com/davemc0/DMcTools/blob/main/Math/SpaceFillCurve.h and for the testing code https://github.com/davemc0/DMcTools/blob/main/Test/SpaceFillCurveTest.cpp .

Including answered 30/3, 2022 at 20:41 Comment(0)
S
2

Another possibility would be to build a kd-tree on your data, and then to an in-order traversal of the tree to get the ordering. Constructing the kd-tree only requires you to have a good median-finding algorithm, of which there are many.

Sprage answered 31/1, 2009 at 17:48 Comment(2)
For dimensionality around 100 kd-tree is not a suitable solution. Its complexity grows exponentially with the number of dimensions.Toscana
Ball trees could work, I've used them with a 768 dimensional dataset.Kesley
M
0

I don't see how you can use a Hilbert curve in one dimension.

If you are interested in mapping points to a lower dimension while preserving distances (with minimum error) then you can look into "Multidimensional Scaling" algorithms.

Simulated annealing is one approach.

Edit: Thanks for the comment. I see what you meant by the Hilbert Curve approach now. However, this is a hard problem, and given N=100 and 10 million data points I don't think any approach will preserve locality well and run in a reasonable amount of time. I don't think kd-trees will work here.

If finding a total ordering is not important to you, then you can look into locality-based hashing and other approximate nearest neighbor schemes. Hierarchical multidimensional scaling with buckets of points to reduce the input size might give you a good ordering, but again it's doubtful in such a high dimension.

Musaceous answered 31/1, 2009 at 17:35 Comment(1)
I intend to use Hilbert curve in N-dimensional space, not in one dimension.Volumetric

© 2022 - 2024 — McMap. All rights reserved.