Benefits of nearest neighbor search with Morton-order?
Asked Answered
B

2

7

While working on the simulation of particle interactions, I stumbled across grid indexing in Morton-order (Z-order)(Wikipedia link) which is regarded to provide an efficient nearest neighbor cell search. The main reason that I've read is the almost sequential ordering of spatially close cells in memory.

Being in the middle of a first implementation, I can not wrap my head around how to efficiently implement the algorithm for the nearest neighbors, especially in comparison to a basic uniform grid.

  1. Given a cell (x,y) it is trivial to obtain the 8 neighbor cell indices and compute the respective z-index. Although this provides constant access time to the elements, the z-index has either to be calculated or looked up in predefined tables (separate for each axis and OR'ing). How can this possibly be more efficient? Is it true, that accessing elements in an array A in an order say A[0] -> A1 -> A[3] -> A[4] -> ... is more efficient than in an order A[1023] -> A[12] -> A[456] -> A[56] -> ...?

  2. I've expected that there exists a simpler algorithm to find the nearest neighbors in z-order. Something along the lines: find first cell of neighbors, iterate. But this can't be true, as this works nicely only within 2^4 sized blocks. There are two problems however: When the cell is not on the boundary, one can easily determine the first cell of the block and iterate through the cells in the block, but one has to check whether the cell is a nearest neighbor. Worse is the case when the cell lies on the boundary, than one has to take into account 2^5 cells. What am I missing here? Is there a comparatively simple and efficient algorithm that will do what I need?

The question in point 1. is easily testable, but I'm not very familiar with the underlying instructions that the described access pattern generates and would really like to understand what is going on behind the scenes.

Thanks in advance for any help, references, etc...


EDIT:
Thank you for clarifying point 1! So, with Z-ordering, the cache hit rate is increased on average for neighbor cells, interesting. Is there a way to profile cache hit/miss rates?

Regarding point 2: I should add that I understand how to build the Morton-ordered array for a point cloud in R^d where the index i = f(x1, x2, ..., xd) is obtained from bitwise interlacing etc. What I try to understand is whether there is a better way than the following naive ansatz to get the nearest neighbors (here in d=2, "pseudo code"):

// Get the z-indices of cells adjacent to the cell containing (x, y) 
// Accessing the contents of the cells is irrelevant here
(x, y) \elem R^2    
point = (x, y)
zindex = f(x, y)     
(zx, zy) = f^(-1)(zindex)          // grid coordinates 
nc = [(zx - 1, zy - 1), (zx - 1, zy), (zx - 1, zy + 1),  // neighbor grid 
      (zx    , zy - 1),               (zx,     zy + 1),  // coordinates
      (zx + 1, zy - 1), (zx + 1, zy), (zx + 1, zy + 1)]

ni= [f(x[0], x[1]) for x in nc]    // neighbor indices
Beberg answered 23/11, 2010 at 19:32 Comment(3)
Here you have a Morton Order implementation in 3D dmytry.pandromeda.com/texts/…Glottology
Here you have the detailed math, algorithms and experimental results compgeom.com/~piyush/papers/tvcg_stann.pdfGlottology
I haven't seen your comments before editing. I will have a closer look at the references, greatly appreciated!Beberg
H
11

In modern multi-level cache-based computer systems, spacial locality is an important factor in optimising access-time to data elements.

Put simply, this means if you access a data element in memory, then accessing another data element in memory that is nearby (has an address that is close to the first) can be cheaper by several orders of magnitude that accessing a data element that is far away.

When 1-d data is accessed sequentially, as in simply image processing or sound processing, or iterating over data structures processing each element the same way, then arranging the data elements in memory in order tends to achieve spatial locality - i.e. since you access element N+1 just after accessing element N, the two elements should be placed next to each other in memory.

Standard c arrays (and many other data structures) have this property.

The point of Morton ordering is to support schemes where data is accessed two dimensionally instead of one dimensionally. In other words, after accessing element (x,y), you may go on to access (x+1,y) or (x,y+1) or similar.

The Morton ordering means that (x,y), (x+1,y) and (x,y+1) are near to each other in memory. In a standard c multidimensional array, this is not necessarily the case. For example, in the array myArray[10000][10000], (x,y) and (x,y+1) are 10000 elements apart - too far apart to take advantage of spatial locality.


In a Morton ordering, a standard c array can still be used as a store for the data, but the calculation to work out where (x,y) is is no longer as simple as store[x+y*rowsize].

To implement your application using Morton ordering, you need to work out how to transform a coordinate (x,y) into the address in the store. In other words, you need a function f(x,y) that can be used to access the store as in store[f(x,y)].

Looks like you need to do some more research - follow the links from the wikipedia page, particularly the ones on the BIGMIN function.

Hanafee answered 23/11, 2010 at 20:18 Comment(1)
Thank you for the explanation about proximity in the array. For the second part, please see my edit.Beberg
E
4

Yes, accessing array elements in order is indeed faster. The CPU loads memory from RAM into cache in chunks. If you access sequentially, the CPU can preload the next chunk easily, and you won't notice the load time. If you access randomly, it can't. This is called cache coherency, and what it means is that accessing memory near to memory you've already accessed is faster.

In your example, when loading A[1], A[2], A[3] and A[4], the processor probably loaded several of those indices at once, making them very trivial. Moreover, if you then go on to try to access A[5], it can pre-load that chunk while you operate on A[1] and such, making the load time effectively nothing.

However, if you load A[1023], the processor must load that chunk. Then it must load A[12]- which it hasn't already loaded and thus must load a new chunk. Et cetera, et cetera. I have no idea about the rest of your question, however.

Earflap answered 23/11, 2010 at 19:45 Comment(1)
Hmmm "Cache coherence" or "locality of reference"?Glottology

© 2022 - 2024 — McMap. All rights reserved.