Why storing a tree as a contiguous chunk of memory?

Asked 13/6, 2014 at 17:26 Answered 13/6, 2014 at 19:20

I just discovered that there are some tree based data structure that, when looking for high performance, are often times stored as a contiguous chunk of memory, this is especially popular when using the so called "policy based data structure" .

The problem is that I can't wrap my head around why one would like to do just that; when you try to "linearize" a tree to store it as a vector/array, how you make sure that you are re-arranging the branches and the leafs in a meaningful way that helps performance ? Is this ok only for perfectly balanced trees ?

In another words I can't imagine the pattern used for access a linear data structure that spans on multiple levels and has multiple leafs; usually a tree add 1 level of indirection for each node/leaf, and this simplifies things a lot for the user, but how one should organize such "linear" tree ?

Tenishatenn answered 13/6, 2014 at 17:26 Comment(17)

Contiguous storage often offers a pretty significant performance boost. The prefetcher can go retrieve memory that is actually needed. – Insolvent 13/6, 2014 at 17:27

@Insolvent yes, but my focus in about how you stuff a tree into a contiguous chunk of memory and what patterns do you use to access this kind of data structures ? Honestly I can't imagine storing something that spans across multiple levels and leafs in just 1 dimension and in 1 block of memory. – Tenishatenn 13/6, 2014 at 17:30

have you ever coded a min (or max) heap data structure? Binary trees stored in contiguous memory are commonly used for such things. See the family of heap-operations provided by the standard library for examples and storage requirements. – Roadrunner 13/6, 2014 at 17:31

@WhozCraig, Ah, yeah. Good example. I remember doing one of those for a priority queue and then finding out that implementation (which went along fine with the assignment spec) wouldn't work at all and I'd have to use some slow method. – Insolvent 13/6, 2014 at 17:34

Just reserve a memory block for x leafs (add more if needed). Tree in an Array – Quicklime 13/6, 2014 at 17:36

@Insolvent that sucks. for true priority-queues they're the cats pajamas. – Roadrunner 13/6, 2014 at 17:38

@Roadrunner no, but I have seen this kind of techniques applied on ternary trees too. – Tenishatenn 13/6, 2014 at 17:44

@Quicklime the problem is not the allocation, the problem is 2D vs 1D, a tree is still a 2D data structure in the simplest case, a vector is 1D, I'm not even considering leafs, if you just picture a tree on a piece of paper, how you stuff that into a 1D array ? Based on what pattern ? – Tenishatenn 13/6, 2014 at 17:48

You may not be able to get better performance in all cases, but you'll never get worse performance. In the simplest case, the entire data structure fits in cache, and you get an incredible boost. Even in the worst case, where each traversal to a child node causes a cache miss and requires that you read data in from main RAM, you'll have the same kind of performance that allocating nodes all over memory would give you. – Weigela 13/6, 2014 at 17:54

@MaxLybbert I'm assuming that your second/worst case is about a diagonal walk inside the tree, in this case, assuming the "standard" pointer based implementation that adds 1 level of indirection each time you go on a new level on the tree, the CPU branch prediction unit should offer better performance then a cache miss ? There are no cases where a cache miss is worst then a good prediction ? The branch prediction is part of the internals of a CPU, a cache miss forces the CPU to go into RAM/external memory and at that point the scope is outside the CPU; this is usually a time consuming op. – Tenishatenn 13/6, 2014 at 17:58

@Tenishatenn Trees can be "serialized" by any pattern of tree traversal. Visiting all nodes in a tree in a given order is exactly the same as fitting the tree into a contiguous block of memory. In the case of balanced binary trees (as a typical implementation of a heap) are serialized effectively breadth-first. – Younger 13/6, 2014 at 17:59

OP: You have to think different : 16 GB of memory is just a big 1D array and we can store almost anything in that! It is the way we store the data in it and the way arrange it that make it 2d, 3d (or a tree). – Quicklime 13/6, 2014 at 18:1

@Quicklime I'm trying :D , a 1D can also support an indefinite amount of levels of indirections . – Tenishatenn 13/6, 2014 at 18:8

@user2485710: The nodes on a standard pointer based tree are generally located at completely random memory locations. Going from one node to another will cause a cache miss. The nodes on an array-based tree might be close to each other, or they might be far away from each other. If they are far away from each other, the performance will be identical to a pointer-based structure. I don't see any case where navigating the standard pointer-based tree would be faster than navigating the standard array-based tree. But there are times when navigating an array-based tree will be faster. – Weigela 13/6, 2014 at 18:27

@MaxLybbert yes, but I'm surprised to see that a data structure coded using pointers ( the "usual" implementation for a tree ) doesn't beat a cache miss even considering how good modern branch predictions are. a BP is all about pointers after all, and a standard implementation for a tree is about pointers too, it surprise me that this 2 do not offer a good edge in terms of performance as much as a bad cache pipeline does. – Tenishatenn 13/6, 2014 at 18:50

Is it at-all conceivable there is a "correct" answer to this question? – Roadrunner 13/6, 2014 at 19:14

@user2485710: All I can say is that measuring works wonders. I would expect that pointer-based containers (trees and lists) would have some actual use cases. However, the performance penalty for jumping around to random locations in memory is so bad that almost anything is better. At the very least, it takes a lot of work to come up with something worse. – Weigela 13/6, 2014 at 19:40

You might find the short article here interesting

Basically, the reasoning behind using a contiguous block of memory for such a structure is it greatly improves lookup and scan times when dealing with potentially large sets of data. If your memory is non-contiguous, you may have to employ costly traversal algorithms to retrieve data from the data structure.

Hopefully this addresses your interests.

Here are the two images from the article that illustrate this concept:

A balanced tree

The tree as stored in contiguous memory:

enter image description here

Bombast answered 13/6, 2014 at 17:48 Comment(5)

well, this is efficient if you go from level N to level N+1, but if you travel the leafs in a diagonal fashion ( which is a quite popular lookup pattern with trees ) this is an horrible arrangement . – Tenishatenn 13/6, 2014 at 17:52

Not at all, as you can deterministically calculate the memory offsets of each node in the diagonal path using the math equations for the number of nodes in the tree at any given level, which allows you to look into the tree using direct memory addressing. It should be noted however that this is not going to be the best solution for ALL cases, but certainly helps for implementations where the size is bounded by some known value, or on memory constrained machines. – Bombast 13/6, 2014 at 17:55

I can calculate what you are suggesting but this doesn't mean that I'll find everything in the cache, which is the point about the performance, I can be sure about where to find the X node or leaf but this doesn't mean that my tree will magically fit in the pipeline. Plus with this vector based approach there is an exponential growth of the contiguous space that I'm requiring as I add more levels to the tree. But when you say "the size is bounded by some known value" what is the size that you are considering ? – Tenishatenn 13/6, 2014 at 18:7

also your picture shows a balanced tree, which is a trivial case, what I get from this is that this kind of arrangement makes sense when you have a balanced tree ( because an unbalanced one will offer a more unpredictable pattern ) and you also have a relatively small tree, because if you linearize a big tree you can easily get a lot of cache misses in some cases. – Tenishatenn 13/6, 2014 at 18:11

@user2485710: This arrangement works equally well with balanced and unbalanced trees, you just have to have some way to mark which "nodes" have no data. This arrangement actually works very well with small trees, but yes, it's terrible with huge trees. – Bora 13/6, 2014 at 18:39

There are actually many such patterns, who have two purposes: saving memory, and keeping nodes togeather, primarily for paging performance.

One very simple version is simply to allocate blocks of three, a parent and two children, and this block has four "child" blocks, two for each child. This cuts your allocations by a third. This isn't much of an optimization, until you scale it up, allocations of 7, 15, 31, 63... if you can get it so that as many keys as possible fit onto a single memory system page, then you minimize the time spent waiting for the hard drive. If your keys are each 32 bytes, and a page is 4K, then you can store up to 125 keys, which means you only have to load one page from the hard drive for each 7 rows of the tree. At that point, you load the "child" page, and then follow another 7 rows. A regular binary tree may only have one node per page, which means you spend 7x as much time simply waiting for the harddrive as you iterate the tree. Quite slow. Rotates are a little tricky as you have to actually swap the data, not the pointers as is common with tree implementations. Also, it has a waste to use a LOT of space when the tree becomes larger.

               ---------
               |   O   |
               |  / \  |
               | O   O |
              _---------_
           __/    \ /    \__
          /       | |       \
--------- --------- --------- ---------
|   O   | |   O   | |   O   | |   O   |
|  / \  | |  / \  | |  / \  | |  / \  |
| O   O | | O   O | | O   O | | O   O |
--------- --------- --------- ---------

Another far more complex "pattern" is to cut the tree in half vertically, so the top is one "subtree", and it has many children "subtrees", and store each "subtree" linearly. And you repeat this recursively. This is a very strange pattern, but ends up working vaguely similar to the above pattern, except it's "cache-oblivious", which means it works with any page size, or cache hierarchy. Quite cool, but they're complex, and virtually everything runs on one of three well known architectures, so they aren't popular. They're also extremely difficult to insert/remove from

Another very simple variant is to put the whole tree into an array accessed via indecies, which saves total memory, but only the top is cache friendly, lower levels are worse cache wise than a regular binary tree. Effectively, the root is at index i=0, and the left child is at (n*2+1 = 1), and the right child is at (n*2+2 = 2). If we're at a node at index 24, it's parent is ((n-1)/2 = 12), and it's left and right children are 49 and 50 respectively. This works great for small trees, because it requires no overhead for pointers whatsoever, The data is stored as a continuous array of values, and the relationships are inferred by the index. Also, adding and removing children always happens near the right end, and normal binary tree insertion/rotation/erasure applies. These also have an interesting mathematical novelty, that if you convert the index plus one to binary, that corresponds with the location in the tree. If we're thinking about the node at index 24, 24+1 in binary is 11001 -> The first 1 always means the root, and from there each 1 means "go right" and each 0 means "go left", which means to get to index 24 from the root you go right, left, left, right, and you're there. Additionally, since there's 5 binary digits, you know it's in the fifth row. Neither of these observations are particularly useful, other than they sort of imply that the root node is a right child, which is vaguely interesting. (If you expand to other-bases, the root is alway the rightmost child). That being said, it's still often useful to implement the root as a left node if you work with bidirectional iterators.

     0
    / \
   /   \
  1     2
 / \   / \
3   4 5   6

[0][1][2][3][4][5][6]

Bora answered 13/6, 2014 at 18:15 Comment(0)

Storing data structures in contiguous memory is a technique used on memory constrained systems, such as embedded systems. The technique may also be used on safety and performance critical systems.

The desktop systems usually have a lot of memory and their applications are short lived. Their process of dynamic memory allocation is to find the next available block in the memory pool and return it. If no memory is available (such as in fragmentation), the allocation fails. There is no control on how much memory can be consumed.

By having a contiguous allocation method, the amount of nodes created can be restricted or limited. This means in a system with 32k of memory, the tree won't use up all the memory and leave holes.

The allocation process is faster using a contiguous system. You know where the blocks are. Also, instead of storing pointers for the link, index values can be stored. This also allows for storing the tree into a file and retrieving easily.

You can model this by creating an array or vector of nodes. Change your node data structure to use array indices instead of pointers.

Remember, the only way to know about performance issues is to profile.

Muddler answered 13/6, 2014 at 17:36 Comment(1)

I'm interested on the pattern, on the internal "linearization" of a tree into a vector, a vector a 1D data structure, a tree is 2D, plus a tree it's not necessarily balanced. You are writing interesting things but this not what I'm interested in, I'm looking for the reasoning behind using a tree as a vector or viceversa. – Tenishatenn 13/6, 2014 at 17:40

how you make sure that you are re-arranging the branches and the leafs in a meaningful way that helps performance?

If you have the program running already (with a non-contiguous tree), you could always just instrument your program to report what its actual node-access patterns typically look like. Once you have a good idea of how the nodes are accessed, you could then customize your node-allocator to allocate the nodes in memory in that same order.

Aquiculture answered 13/6, 2014 at 17:37 Comment(2)

what if the behaviour is non-deterministic ? At that point why using a tree at all ? I can't just use a vector if I have constant pattern . – Tenishatenn 13/6, 2014 at 17:42

If the access patterns are completely random, then I don't think making the data in contiguous memory can help you. But the behavior doesn't need to be completely deterministic; if it's regular enough that you can see a general pattern (e.g. queries recurse down from the root node, and at each node the query examines child nodes from first to last) then you can use that information to order the data in such a way that cache misses are minimized. – Aquiculture 13/6, 2014 at 17:52

" ... try to "linearize" a tree to store it as a vector/array, how you make sure that you are re-arranging the branches and the leafs in a meaningful way that helps performance ..."

I believe you are thinking too hard.

In a normal tree, you use 'new' to request free space in which to create a node.

You use delete to return the no longer needed space to the heap.

Then connect the nodes using pointers.

For a 'tree-in-a-vector', you might might simply re-implement your new and delete to find space in a vector.

I think instead of pointers (to parent, or left or right node) you use the index (to parent, or left or right node).

I believe the index of the nth item in a vector (both before and after the re-allocate for growth) is unchanged.

Another challenge is the delete of a node ... but this can be as simple as any node (or index) greater than the erased node reduces by 1.

These choices can be a fair trade for a tree that changes seldom, but must be captured as quickly as possible.

Does storing your tree really require the performance of a vector block save? Is the vector block save actually faster than depth-first save of the same tree. You really need to measure.

Mccullum answered 13/6, 2014 at 18:8 Comment(1)

I worked on this issue in an embedded system. The 'vector' was actually in a fixed, battery backed, static ram. Once you got the data in there, it was already stored. It updated only when a human changed a configuration item. – Mccullum 13/6, 2014 at 18:38

It is important to differentiate a tree and an AVL tree. In your question you talk about balancing the tree so your question is probably about an AVL tree representation in an array.

All the other answers talk about a tree and not an AVL tree. As far as I know, this kind of tree can be represented in an array but cannot be efficiently updated since you have to reorder many elements of the array instead of playing with memory pointers.

That means you can represent a perfectly ordered balanced tree in an array as long as the input elements are already sorted. This tree will be faster than a usual memory tree but updating it will be "harder".

My conclusion would be :

read-only tree with lot of access => in an array
updatable tree with lot of changes => in memory

Quicklime answered 13/6, 2014 at 19:20 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags