Efficient Array Storage for Binary Tree

Asked 20/4, 2010 at 14:7 Answered 9/1, 2021 at 20:11

Solved arrays algorithm data-structures binary-tree

We have to write the nodes of a binary tree to a file. What is the most space efficient way of writing a binary tree . We can store it in array format with parent in position i and its children in 2i, 2i+1. But this will waste lot of space in case of sparse binary trees.

Vickey answered 20/4, 2010 at 14:7 Comment(0)

One method which I like is to store the preorder traversal, but also include the 'null' nodes in there. Storing the 'null' nodes removes the need for also storing the inorder of the tree.

Some advantages of this method

You can do better storage than pre/post + inorder method in most practical cases.
Serialization just takes one traversal
Deserialization can be done in one pass.
The inorder traversal can be gotten in one pass without constructing the tree, which might be useful if the situation calls for it.

For example say you had a binary tree of 64 bit integers, you can store an extra bit after each node saying whether the next is a null node or not (the first node is always the root). Null nodes, you can represent by a single bit.

So if there are n nodes, the space usage would be 8n bytes + n-1 indicator bits + n+1 bits for null nodes = 66*n bits.

In the pre/post + inorder you will end up using 16n bytes= 128*n bits.

So you save a space of 62*n bits over this pre/post + inorder method.

Consider the tree

       100
      /   \
     /     \
    /       \
   10       200
  / \       /  \
 .   .     150  300
          / \    / \
         .   .   .  .

where the '.' are the null nodes.

You will serialize it as 100 10 . . 200 150 . . 300 . .

Now each (including subtrees) 'preorder traversal with null' has the property that number of null nodes = number of nodes + 1.

This allows you to create the tree, given the serialized version in one pass, as the first node is the root of the tree. Nodes that follow are the left subtree followed by right, which can be viewed to be like this:

100 (10 . .) (200 (150 . .) (300 . .))

To create the inorder traversal, you use a stack and push when you see a node and pop (onto a list) when you see a null. The resulting list is the inorder traversal (a detailed explanation for this can be found here: C++/C/Java: Anagrams - from original string to target;).

Great answered 20/4, 2010 at 16:26 Comment(7)

how can we get the inorder traversal as mentioned by u in the advantages ?? – Joachim 29/10, 2012 at 4:8

@Joachim the inorder traversal referred to here is the depth first traversal of the tree and can be found by iterating through the resulting array, skipping anything that's a null. – Saideman 27/3, 2013 at 11:32

Isn't the point of the question that you rebuild the tree the same as it was before. I don't understand how you can do this with your solution. You only have the inorder traversal at the end. Can you please explain? – Subarid 29/6, 2013 at 2:34

@user529734 at the end you will have in-order and pre-order traversal so you can rebuild the tree the same as it was before. Explained here geeksforgeeks.org/…. – Gretta 5/8, 2013 at 7:45

what is n-1 indicator bit? and also how do you handle 64 bit integers (used for value), and bit (used for null pointer), in code? – Giddy 26/9, 2013 at 14:16

what data structures do you propose to serialize and deserialize the tree ? – Mensal 25/12, 2013 at 23:36

There is an article to save extra bits for each node to indicate 'null' using Catalan numbers. – Philippa 14/5, 2019 at 3:45

The 2i, 2i+1 (Binary Heap) method is indeed the best way if you have a (nearly) complete tree.

Otherwise you won't escape storing a ParentId (parent Index) with each node.

Bullfight answered 20/4, 2010 at 14:12 Comment(0)

Think about XML. It's a kind of tree serialization. For example:

<node id="1">
    <node id="2">                                   1
    </node>                                       /   \
    <node id="3">                                2     3
        <node id="4">                                 / \
        </node>                                      4   5
        <node id="5">
        </node>
    </node>
</node>

Then, why the spaces and tags ? We can omit them, step by step:

<1>
   <2></>
   <3>
     <4></>
     <5></>
   </>
</>

Remove the spaces: <1><2></2><3><4></><5></></></>.

Remove the angle brackets: 12/34/5///

Now the problem is: what if a node has a empty left subtree and non-empty right subtree? Then we can use another special charactor, '#' to represent an empty left sub-tree.

For example:

This tree can be serialized as: 1#23///.

Chipboard answered 26/4, 2013 at 14:6 Comment(1)

This is also using 2n space, how is this any better than preoder/inorder ? – Mensal 25/12, 2013 at 23:59

You can save the in-order and pre/post-order traversal of the binary tree in the file and reconstruct the tree from these traversals.

Maxon answered 20/4, 2010 at 14:11 Comment(1)

this case i will always consume 2n space (n for inorder and n for post order). – Vickey 20/4, 2010 at 16:12

Since only 2ⁿ-1 items can be stored without any space wasted, you'll have to split the N elements into subtrees, stored sequentially, and keeping a “master index” of the values between trees to decide in which tree to search in. An obvious way to decide which will be the subtrees, is to take advantage of the binary format of a number, and here I'm talking about N, the count of items in the tree.

Let's assume you have a sorted sequence of 20 distinct items:

['aeu', 'bfz', 'cdi', 'dfc', 'eap', 'ggk', 'gsb', 'guj', 'idm', 'ieg',
 'izr', 'pba', 'plp', 'rap', 'rhp', 'tat', 'uho', 'uwb', 'wdf', 'yhp']

20₁₀ is 10100₂ and the 1-bits decide the splits.

The first entry in your master table will be item[15], 0 (15₁₀≡01111₂ one less than the largest power of 2 less than or equal to N, and 0 being the length of your output list so far aka the start index for the subtree ). The previous 15 (0–14) items will be appended to your output list, again by taking advantage of binary arithmetic: item[7₁₀≡0111₂] at 0, item[3₁₀≡0011₂] at 1, item[11₁₀≡1001₂] at 2 and so on, thus preserving the rule that item[i] < item[2*i] && item[i] > item[2*i+1].

After this first step:

master_table = [('tat', 0)]
output_list = ['guj', 'dfc', 'pba', 'bfz', 'ggk', 'ieg', 'rap', 'aeu', 'cdi', 'eap', 'gsb', 'idm', 'izr', 'plp', 'rhp']

If N == 2ⁿ then this python generator produces the desired order:

def heap_order(pow2):
    """Return the tree order of items
    pow2 MUST be a power of two"""

    bit= pow2>>1
    while bit:
        yield from range(bit - 1, pow2, bit*2)
        bit>>= 1

Delete (or ignore) the first 16 items from your list and repeat with the next 1-bit in the list length:

The remaining input list is ['uho', 'uwb', 'wdf', 'yhp'], and its length is 4₁₀==100₂. Largest 2ⁿ-1 is 3, so you append zero-based item 3 along with the current length of the output list to your master table, and then append the previous 3 items to your output list resulting in:

master_table = [('tat', 0), ('yhp', 15)]
# master table items: (key, subtree_base_index)
output_list = [
    'guj', 'dfc', 'pba', 'bfz', 'ggk',
    'ieg', 'rap', 'aeu', 'cdi', 'eap',
    'gsb', 'idm', 'izr', 'plp', 'rhp',

    'uwb', 'uho', 'wdf'
]

The whole result is equivalent to a binary tree where all left subtrees have either leaf nodes or tree nodes with both children.

To search for a value: scan the master table to find the key larger than your search key, and search your key in the corresponding subtree, always using index - subtree_base_index adjustments in your 2*index, 2*index-1 calculations. The maximum total (master table + subtree) count of comparisons needed for failed search is usually ⌈log₂N⌉, and sometimes —when the search key is larger than the first entry of the master table— less than that (no subtrees are created for 0-bits of N).

You should also store the level count of each subtree in the master table (so that you know after how many searches you should stop searching the subtree).

Typewritten answered 19/5, 2020 at 11:4 Comment(0)

You can store the level-order, aka DFS output, including any null children. When reconstructing the tree from the list of values and nulls, use a queue of nodes with the knowledge that the next two items in the list are always the left and right of the current node from the queue.

This still requires storing a total of n+1 "null signifiers", but deserializing is done iteratively as opposed to recursively in the case of preorder.

POC in JS:

class Node {
  constructor(val, left=null, right=null) {
    this.val = val;
    this.left = left;
    this.right = right;
  }
}

function dfs(root) {
  const queue = [root];
  const output = [];

  while (queue.length > 0) {
    const curr = queue.pop();

    // if curr is null, push null, else curr.val
    output.push(curr && curr.val);

    if (curr !== null) {
      queue.unshift(curr.left);
      queue.unshift(curr.right);
    }
  }

  return output;
}

function undfs(list) {
  const root = new Node(list.shift());
  const queue = [root];

  while (queue.length > 0) {
    const curr = queue.shift();
    const [leftVal, rightVal] = list.splice(0,2);

    if (leftVal) {
      curr.left = new Node(leftVal);
      queue.push(curr.left);
    }

    if (rightVal) {
      curr.right = new Node(rightVal);
      queue.push(curr.right);
    }
  }

  return root;
}

const root =
  new Node(100,
    new Node(10),
    new Node(200,
      new Node(150),
      new Node(300)
    )
  );

const dfsOutput = dfs(root);

console.log("List output:");
console.log(dfsOutput.map(a => a || "null").join(", "));

console.log("\nTree output:");
console.log(undfs(dfsOutput));

Loudermilk answered 9/1, 2021 at 20:11 Comment(1)

Nice suggestion. One nit: I believe level-order is the output of BFS (breadth first) not DFS (depth first) – Gd 23/8, 2021 at 11:36

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags