Heuristics to sort array of 2D/3D points according their mutual distance

Asked 17/5, 2016 at 8:52 Answered 18/5, 2016 at 19:19

Solved sorting caching distance computational-geometry nearest-neighbor

Consider array of points in 2D,3D,(4D...) space ( e.g. nodes of unstructured mesh ). Initially the index of a point in array is not related to its position in space. In simple case, assume I already know some nearest neighbor connectivity graph.

I would like some heuristics which increase probability that two points which are close to each other in space would have similar index (would be close in array).

I understand that exact solution is very hard (perhaps similar to Travelling salesman problem ) but I don't need exact solution, just something which increase probability.

My ideas on solution:

some naive solution would be like:

1. for each point "i" compute fitness E_i given by sum of distances in array (i.e. index-wise) from its spatial neighbors (i.e. space-wise)
   E_i = -Sum_k ( abs( index(i)-index(k) ) ) 
   where "k" are spatial nearest neighbors of "i" 
2. for pairs of points (i,j) which have low fitness (E_i,E_j) 
   try to swap them, 
   if fitness improves, accept

but the detailed implementation and its performance optimization is not so clear.

Other solution which does not need precomputed nearest-neighbors would be based on some Locality-sensitive_hashing

I think this could be quite common problem, and there may exist good solutions, I do not want to reinvent the wheel.

Application:

improve cache locality, considering that memory access is often bottleneck of graph-traversal
it could accelerate interpolation of unstructured grid, more specifically search for nodes which are near the smaple (e.g. centers of Radial-basis function).

Overwrought answered 17/5, 2016 at 8:52 Comment(5)

I don't even get what you are trying to say in the "naive solution". What is your metric for computing if two points are close or not? – Adulate 17/5, 2016 at 19:52

Some metric, e.g. Euclidean. Why? does i matter which metric I use? Nearest neighbors could have also multiple definitions, but some natural definition would be N points with smalest distance. I did not want specify these details since it would disturb generality of the question. – Overwrought 18/5, 2016 at 8:16

gsamaras > aha, the source of confusion was that I mess up the formula for computing fittness ( changed k and j ) . Now I corrected i to E_i = -Sum_k ( abs( index(i)-index(k) ) ) ... hope it is more clear now – Overwrought 18/5, 2016 at 8:20

What is wrong with the LSH you mentioned? This just seems to fit perfectly to your task. – Electroform 18/5, 2016 at 8:27

I doubt it @Regenschein, check my answer. – Adulate 18/5, 2016 at 19:19

I'd say space filling curves (SPC) are the standard solution to map proximity in space to a linear ordering. The most common ones are Hilbert-curves and z-curves (Morton order).

Hilbert curves have the best proximity mapping, but they are somewhat expensive to calculate. Z-ordering still has a good proximity mapping but is very easy to calculate. For z-ordering, it is sufficient to interleave the bits of each dimension. Assuming integer values, if you have a 64bit 3D point (x,y,z), the z-value is $x_0,y_0,z_0,x_1,y_1,z_1, ... x_63,y_63,z_63$, i.e. a 192 bit value consisting of the first bit of every dimension, followed by the second bit of every dimension, and so on. If your array is ordered according to that z-value, points that are close in space are usually also close in the array.

Here are example functions that interleave (merge) values into a z-value (nBitsPerValue is usually 32 or 64):

public static long[] mergeLong(final int nBitsPerValue, long[] src) {
    final int DIM = src.length;
    int intArrayLen = (src.length*nBitsPerValue+63) >>> 6;
    long[] trg = new long[intArrayLen];

    long maskSrc = 1L << (nBitsPerValue-1);
    long maskTrg = 0x8000000000000000L;
    int srcPos = 0;
    int trgPos = 0;
    for (int j = 0; j < nBitsPerValue*DIM; j++) {
        if ((src[srcPos] & maskSrc) != 0) {
            trg[trgPos] |= maskTrg;
        } else {
            trg[trgPos] &= ~maskTrg;
        }
        maskTrg >>>= 1;
        if (maskTrg == 0) {
            maskTrg = 0x8000000000000000L;
            trgPos++;
        }
        if (++srcPos == DIM) {
            srcPos = 0;
            maskSrc >>>= 1;
        }
    }
    return trg;
}

You can also interleave the bits of floating point values (if encoded with IEEE 754, as they usually are in standard computers), but this results in non-euclidean distance properties. You may have to transform negative values first, see here, section 2.3.

EDIT Two answer the questions from the comments:

1) I understand how to make space filling curve for regular rectangular grid. However, if I have randomly positioned floating points, several points can map into one box. Would that algorithm work in that case?

There are several ways to use floating point (FP) values. The simplest is to convert them to integer values by multiplying them by a large constant. For example multiply everything by 10^6 to preserve 6 digit precision.

Another way is to use the bitlevel representation of the FP value to turn it into an integer. This has the advantage that no precision is lost and you don't have to determine a multiplication constant. The disadvantage is that euclidean distance metric do not work anymore.

It works as follows: The trick is that the floating point values do not have infinite precision, but are limited to 64bit. Hence they automatically form a grid. The difference to integer values is that floating point values do not form a quadratic grid but a rectangular grid where the rectangles get bigger with growing distance from (0,0). The grid-size is determined by how much precision is available at a given point. Close to (0,0), the precision (=grid_size) is 10^-28, close to (1,1), it is 10^-16 see here. This distorted grid still has the proximity mapping, but distances are not euclidean anymore.

Here is the code to do the transformation (Java, taken from here; in C++ you can simply cast the float to int):

public static long toSortableLong(double value) {
    long r = Double.doubleToRawLongBits(value);
    return (r >= 0) ? r : r ^ 0x7FFFFFFFFFFFFFFFL;
}

public static double toDouble(long value) {
    return Double.longBitsToDouble(value >= 0.0 ? value : value ^ 0x7FFFFFFFFFFFFFFFL);
}

These conversion preserve ordering of the converted values, i.e. for every two FP values the resulting integers have the same ordering with respect to <,>,=. The non-euclidean behaviour is caused by the exponent which is encoded in the bit-string. As mentioned above, this is also discussed here, section 2.3, however the code is slightly less optimized.

2) Is there some algorithm how to do iterative update of such space filling curve if my points moves in space? ( i.e. without reordering the whole array each time )

The space filling curve imposes a specific ordering, so for every set of points there is only one valid ordering. If a point is moved, it has to be reinserted at the new position determined by it's z-value.

The good news is that small movement will likely mean that a point may often stay in the same 'area' of your array. So if you really use a fixed array, you only have to shift small parts of it.

If you have a lot of moving objects and the array is to cumbersome, you may want to look into 'moving objects indexes' (MX-CIF-quadtree, etc). I personally can recommend my very own PH-Tree. It is a kind of bitwise radix-quadtree that uses a z-curve for internal ordering. It is quite efficient for updates (and other operations). However, I usually recommend it only for larger datasets, for small datasets a simple quadtree is usually good enough.

Contredanse answered 18/5, 2016 at 8:49 Comment(3)

Nice answer. I posted one too, with supplementary material. – Adulate 18/5, 2016 at 19:19

Thanks, I think this fits best my case. I'm not sure about two aspects: 1) I understand how to make space filling curve for regular rectangular grid. However, if I have randomly positioned floating points, several points can map into one box. Would that algorithm work in that case? 2) Is there some algorithm how to do iterative update of such space filling curve if my points moves in space? ( i.e. without reordering the whole array each time ) – Overwrought 19/5, 2016 at 7:31

aha, I see this page from CGAL supplements well your answer doc.cgal.org/latest/Spatial_sorting/… – Overwrought 19/5, 2016 at 7:41

The problem you are trying to solve has meaning iff, given a point p and its NN q, then it is true that the NN of q is p.

That is not trivial, since for example the two points can represent positions in a landscape, so the one point can be high in a mountain, so going from the bottom up to mountain costs more that the other way around (from the mountain to the bottom). So, make sure you check that's not your case.

Since TilmannZ already proposed a solution, I would like to emphasize on LSH you mentioned. I would not choose that, since your points lie in a really low dimensional space, it's not even 100, so why using LSH?

I would go for CGAL's algorithm on that case, such as 2D NNS, or even a simple kd-tree. And if speed is critical, but space is not, then why not going for a quadtree (octree in 3D)? I had built one, it won't go beyond 10 dimensions in an 8GB RAM.

If however, you feel that your data may belong in a higher dimensional space in the future, then I would suggest using:

LSH from Andoni, really cool guy.
FLANN, which offers another approach.
kd-GeRaF, which is developed by me.

Adulate answered 18/5, 2016 at 19:19 Comment(2)

Thanks, the answer of TilmannZ is more to the point, but your comments are also very useful. The CGAL library could be very useful, but I try to keep my codebase small, simple without much dependecies. But I will probably look into CGALs code to copy some algorithms. – Overwrought 19/5, 2016 at 7:39

You are welcome @ProkopHapala. Sure it is! Hmm, that would be a bit hard, since cgal is a bit grumpy at first! Maybe it will be better to implement the algorithms yourself, but you said you don't want to reinvent the wheel, but I see the trade off you are mentioning here. Good luck! Nice question BTW! – Adulate 19/5, 2016 at 7:53

Recommended topics

Hot tags