Finding last element of a binary heap
Asked Answered
B

6

18

quoting Wikipedia:

It is perfectly acceptable to use a traditional binary tree data structure to implement a binary heap. There is an issue with finding the adjacent element on the last level on the binary heap when adding an element which can be resolved algorithmically...

Any ideas on how such an algorithm might work?

I was not able to find any information about this issue, for most binary heaps are implemented using arrays.

Any help appreciated.


Recently, I have registered an OpenID account and am not able to edit my initial post nor comment answers. That's why I am responding via this answer. Sorry for this.


quoting Mitch Wheat:

@Yse: is your question "How do I find the last element of a binary heap"?

Yes, it is. Or to be more precise, my question is: "How do I find the last element of a non-array-based binary heap?".

quoting Suppressingfire:

Is there some context in which you're asking this question? (i.e., is there some concrete problem you're trying to solve?)

As stated above, I would like to know a good way to "find the last element of a non-array-based binary heap" which is necessary for insertion and deletion of nodes.

quoting Roy:

It seems most understandable to me to just use a normal binary tree structure (using a pRoot and Node defined as [data, pLeftChild, pRightChild]) and add two additional pointers (pInsertionNode and pLastNode). pInsertionNode and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This gives O(1) access to both insertion point and last node of the structure.

Yes, this should work. If I am not mistaken, it could be a little bit tricky to find the insertion node and the last node, when their locations change to another subtree due to an deletion/insertion. But I'll give this a try.

quoting Zach Scrivena:

How about performing a depth-first search...

Yes, this would be a good approach. I'll try that out, too.

Still I am wondering, if there is a way to "calculate" the locations of the last node and the insertion point. The height of a binary heap with N nodes can be calculated by taking the log (of base 2) of the smallest power of two that is larger than N. Perhaps it is possible to calculate the number of nodes on the deepest level, too. Then it was maybe possible to determine how the heap has to be traversed to reach the insertion point or the node for deletion.

Buffoon answered 1/2, 2009 at 2:42 Comment(2)
@Yse: is your question "How do I find the last element of a binary heap"?Yellowstone
Use 2 heaps, or (worse because loses O(1) avarage insert) a binary heap and keep track of the largest and smallest: stackoverflow.com/questions/7878622/…Jeffiejeffrey
E
21

Basically, the statement quoted refers to the problem of resolving the location for insertion and deletion of data elements into and from the heap. In order to maintain "the shape property" of a binary heap, the lowest level of the heap must always be filled from left to right leaving no empty nodes. To maintain the average O(1) insertion and deletion times for the binary heap, you must be able to determine the location for the next insertion and the location of the last node on the lowest level to use for deletion of the root node, both in constant time.

For a binary heap stored in an array (with its implicit, compacted data structure as explained in the Wikipedia entry), this is easy. Just insert the newest data member at the end of the array and then "bubble" it into position (following the heap rules). Or replace the root with the last element in the array "bubbling down" for deletions. For heaps in array storage, the number of elements in the heap is an implicit pointer to where the next data element is to be inserted and where to find the last element to use for deletion.

For a binary heap stored in a tree structure, this information is not as obvious, but because it's a complete binary tree, it can be calculated. For example, in a complete binary tree with 4 elements, the point of insertion will always be the right child of the left child of the root node. The node to use for deletion will always be the left child of the left child of the root node. And for any given arbitrary tree size, the tree will always have a specific shape with well defined insertion and deletion points. Because the tree is a "complete binary tree" with a specific structure for any given size, it is very possible to calculate the location of insertion/deletion in O(1) time. However, the catch is that even when you know where it is structurally, you have no idea where the node will be in memory. So, you have to traverse the tree to get to the given node which is an O(log n) process making all inserts and deletions a minimum of O(log n), breaking the usually desired O(1) behavior. Any search ("depth-first", or some other) will be at least O(log n) as well because of the traversal issue noted and usually O(n) because of the random nature of the semi-sorted heap.

The trick is to be able to both calculate and reference those insertion/deletion points in constant time either by augmenting the data structure ("threading" the tree, as mention in the Wikipedia article) or using additional pointers.

The implementation which seems to me to be the easiest to understand, with low memory and extra coding overhead, is to just use a normal simple binary tree structure (using a pRoot and Node defined as [data, pParent, pLeftChild, pRightChild]) and add two additional pointers (pInsert and pLastNode). pInsert and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This implementation gives O(1) access to both insertion point and last node of the structure and should allow preservation of overall O(1) behavior in both insertion and deletions. The cost of the implementation is two extra pointers and some minor extra code in the insertion/deletion subroutines (aka, minimal).

EDIT: added pseudocode for an O(1) insert()

Here is pseudo code for an insert subroutine which is O(1), on average:

define Node = [T data, *pParent, *pLeft, *pRight]

void insert(T data)
{
    do_insertion( data );   // do insertion, update count of data items in tree

    # assume: pInsert points node location of the tree that where insertion just took place
    #   (aka, either shuffle only data during the insertion or keep pInsert updated during the bubble process)

    int N = this->CountOfDataItems + 1;     # note: CountOfDataItems will always be > 0 (and pRoot != null) after an insertion

    p = new Node( <null>, null, null, null);        // new empty node for the next insertion

    # update pInsert (three cases to handle)
    if ( int(log2(N)) == log2(N) )
        {# #1 - N is an exact power of two
        # O(log2(N))
        # tree is currently a full complete binary tree ("perfect")
        # ... must start a new lower level
        # traverse from pRoot down tree thru each pLeft until empty pLeft is found for insertion
        pInsert = pRoot;
        while (pInsert->pLeft != null) { pInsert = pInsert->pLeft; }    # log2(N) iterations
        p->pParent = pInsert;
        pInsert->pLeft = p;
        }
    else if ( isEven(N) )
        {# #2 - N is even (and NOT a power of 2)
        # O(1)
        p->pParent = pInsert->pParent;
        pInsert->pParent->pRight = p;
        }
    else 
        {# #3 - N is odd
        # O(1)
        p->pParent = pInsert->pParent->pParent->pRight;
        pInsert->pParent->pParent->pRight->pLeft = p;
        }
    pInsert = p;

    // update pLastNode
    // ... [similar process]
}

So, insert(T) is O(1) on average: exactly O(1) in all cases except when the tree must be increased by one level when it is O(log N), which happens every log N insertions (assuming no deletions). The addition of another pointer (pLeftmostLeaf) could make insert() O(1) for all cases and avoids the possible pathologic case of alternating insertion & deletion in a full complete binary tree. (Adding pLeftmost is left as an exercise [it's fairly easy].)

Emigrant answered 1/2, 2009 at 2:42 Comment(5)
You understand the problem, but the O(1) argument you provide is incorrect. For example, how do you update pInsertionNode after an insertion?Own
@stefan.ciobaca: I agree that @Emigrant has no addressed that issue. However, it's easy with the addition of more pointers, say a couple in each node that "thread" across the levels of the tree. (In terms of the array, each array element points one forward and one back.)Comedo
Actually, given a known tree size for a complete binary tree, you can calculate where in the tree the next insert/delete nodes will be and move to them in O(1) average time [upper bound O(log n)] (using basically the same analysis as that determining the O(1) average insert+bubble time for heaps).Calysta
You do have to have a pointer to parent in the node structure in order to accomplish the O(1) traversal, which I inadvertently left out of the Node definition. I've now edited the post to include the revised definition. And I'll try to post some pseudocode for the O(1) traversal later today.Calysta
@roy: I think your solution doesn't produce O(1) time: Imagine you're trying to insert immediately after the rightmost child of the left subtree of a tree with height >2: you're going to have to do two traversals: one all the way up to the root, and another all the way back down to the leftmost child of the right sub-tree. You're never going to know exactly how many "pParent->pParent"s you'll need, I think. Which I think means the time has got to be avg(1, 2*log h)?Kimberli
D
9

My first time to participate in stack overflow.

Yes, the above answer by Zach Scrivena (god I don't know how to properly refer to other people, sorry) is right. What I want to add is a simplified way if we are given the count of nodes.

The basic idea is:

Given the count N of nodes in this full binary tree, do "N % 2" calculation and push the results into a stack. Continue the calculation until N == 1. Then pop the results out. The result being 1 means right, 0 means left. The sequence is the route from root to target position.

Example:

The tree now have 10 nodes, I want insert another node at position 11. How to route it?

11 % 2 = 1  --> right    (the quotient is 5, and push right into stack)
 5 % 2 = 1  --> right    (the quotient is 2, and push right into stack)
 2 % 2 = 0  --> left     (the quotient is 1, and push left into stack. End)

Then pop the stack: left -> right -> right. This is the path from the root.

Dumb answered 4/2, 2016 at 4:13 Comment(0)
C
7

You could use the binary representation of the size of the Binary Heap to find the location of the last node in O(log N). The size could be stored and incremented which would take O(1) time. The the fundamental concept behind this is the structure of the binary tree.

Suppose our heap size is 7. The binary representation of 7 is, "111". Now, remember to always omit the first bit. So, now we are left with "11". Read from left-to-right. The bit is '1', so, go to the right child of the root node. Then the string left is "1", the first bit is '1'. So, again go to the right child of the current node you are at. As you no longer have bits to process, this indicates that you have reached the last node. So, the raw working of the process is that, convert the size of the heap into bits. Omit the first bit. According to the leftmost bit, go to the right child of the current node if it is '1', and to the left child of the current node if it is '0'.

As you always to to the very end of the binary tree this operation always takes O(log N) time. This is a simple and accurate procedure to find the last node.

You may not understand it in the first reading. Try working this method on the paper for different values of Binary Heap, I'm sure you'll get the intuition behind it. I'm sure this knowledge is enough to solve your problem, if you want more explanation with figures, you can refer to my blog.

Hope my answer has helped you, if it did, let me know...! ☺

Calcine answered 8/2, 2015 at 11:31 Comment(5)
Welcome to Stackoverflow ! Thank you for your answer , Try to include the Answer as the name says Answer here , not in the external link which maybe unavailable in future !Melan
Yes sir... I know that.... If I have to explain it would be big and tedious, so I gave a link to my blog... Is there any simpler way to answer, sir..? (Given that I already have the content)...Calcine
I have put the main content in the answer sir.... Although I can assure you the link will remain, even if the link somehow fails, I believe the question is still answered sir.... Is this acceptable..??Calcine
Vamsi, how did you even come up with this solution, what were the steps you took to even think of first taking the binary and trying to follow a pattern from there or was this something that you learned from someone/professor.Tufted
I didn't come up with the solution at all...! It was taught to us in the classroom...! :)Calcine
S
1

How about performing a depth-first search, visiting the left child before the right child, to determine the height of the tree. Thereafter, the first leaf you encounter with a shorter depth, or a parent with a missing child would indicate where you should place the new node before "bubbling up".


The depth-first search (DFS) approach above doesn't assume that you know the total number of nodes in the tree. If this information is available, then we can "zoom-in" quickly to the desired place, by making use of the properties of complete binary trees:

Let N be the total number of nodes in the tree, and H be the height of the tree.

Some values of (N,H) are (1,0), (2,1), (3,1), (4,2), ..., (7,2), (8, 3). The general formula relating the two is H = ceil[log2(N+1)] - 1. Now, given only N, we want to traverse from the root to the position for the new node, in the least number of steps, i.e. without any "backtracking". We first compute the total number of nodes M in a perfect binary tree of height H = ceil[log2(N+1)] - 1, which is M = 2^(H+1) - 1.

If N == M, then our tree is perfect, and the new node should be added in a new level. This means that we can simply perform a DFS (left before right) until we hit the first leaf; the new node becomes the left child of this leaf. End of story.

However, if N < M, then there are still vacancies in the last level of our tree, and the new node should be added to the leftmost vacant spot. The number of nodes that are already at the last level of our tree is just (N - 2^H + 1). This means that the new node takes spot X = (N - 2^H + 2) from the left, at the last level.

Now, to get there from the root, you will need to make the correct turns (L vs R) at each level so that you end up at spot X at the last level. In practice, you would determine the turns with a little computation at each level. However, I think the following table shows the big picture and the relevant patterns without getting mired in the arithmetic (you may recognize this as a form of arithmetic coding for a uniform distribution):

0 0 0 0 0 X 0 0 <--- represents the last level in our tree, X marks the spot!
          ^
L L L L R R R R <--- at level 0, proceed to the R child
L L R R L L R R <--- at level 1, proceed to the L child
L R L R L R L R <--- at level 2, proceed to the R child 
          ^                      (which is the position of the new node)
          this column tells us
          if we should proceed to the L or R child at each level

EDIT: Added a description on how to get to the new node in the shortest number of steps assuming that we know the total number of nodes in the tree.

Snailpaced answered 1/2, 2009 at 5:1 Comment(0)
H
0

Solution in case you don't have reference to parent !!! To find the right place for next node you have 3 cases to handle

  • case (1) Tree level is complete Log2(N)
  • case (2) Tree node count is even
  • case (3) Tree node count is odd

Insert:

void Insert(Node root,Node n)
{
Node parent = findRequiredParentToInsertNewNode (root);
if(parent.left == null)
parent.left = n;
else
parent.right = n;
}

Find the parent of the node in order to insert it

void findRequiredParentToInsertNewNode(Node root){

Node last = findLastNode(root);

 //Case 1 
 if(2*Math.Pow(levelNumber) == NodeCount){
     while(root.left != null)
      root=root.left;
  return root;
 }
 //Case 2
 else if(Even(N)){
   Node n =findParentOfLastNode(root ,findParentOfLastNode(root ,last));
 return n.right;
 }
//Case 3
 else if(Odd(N)){
  Node n =findParentOfLastNode(root ,last);
 return n;
 }

}

To find the last node you need to perform a BFS (breadth first search) and get the last element in the queue

 Node findLastNode(Node root)
 {
     if (root.left == nil)
         return root

     Queue q = new Queue();

     q.enqueue(root);
     Node n = null;

     while(!q.isEmpty()){
      n = q.dequeue();
     if ( n.left != null )
         q.enqueue(n.left);
     if ( n.right != null )
         q.enqueue(n.right);
        }
   return n;
}

Find the parent of the last node in order to set the node to null in case replacing with the root in removal case

 Node findParentOfLastNode(Node root ,Node lastNode)
{
    if(root == null)
        return root;

    if( root.left == lastNode || root.right == lastNode )
        return root;

    Node n1= findParentOfLastNode(root.left,lastNode);
    Node n2= findParentOfLastNode(root.left,lastNode);

    return n1 != null ? n1 : n2;
}
Herculie answered 31/7, 2013 at 1:20 Comment(0)
B
0

I know this is an old thread but i was looking for a answer to the same question. But i could not afford to do an o(log n) solution as i had to find the last node thousands of times in a few seconds. I did have a O(log n) algorithm but my program was crawling because of the number of times it performed this operation. So after much thought I did finally find a fix for this. Not sure if anybody things this is interesting.

This solution is O(1) for search. For insertion it is definitely less than O(log n), although I cannot say it is O(1).

Just wanted to add that if there is interest, i can provide my solution as well. The solution is to add the nodes in the binary heap to a queue. Every queue node has front and back pointers.We keep adding nodes to the end of this queue from left to right until we reach the last node in the binary heap. At this point, the last node in the binary heap will be in the rear of the queue. Every time we need to find the last node, we dequeue from the rear,and the second-to-last now becomes the last node in the tree. When we want to insert, we search backwards from the rear for the first node where we can insert and put it there. It is not exactly O(1) but reduces the running time dramatically.

Beating answered 7/11, 2017 at 18:18 Comment(2)
The insertion into the queue has O(n) worst case running time and can be too slow in practice. Maybe it ran fast in your case because your data had a particular distribution (e.g. the new data is already sorted).Hygienic
@mefathy: Yes, the data was kinda sorted in my program. And it would have a worse case insertion of O(n) in the queue. But as N increases the chances of walking the entire queue also reduces because the node where the insertion needs to take place will be closer to the rear (or depths closer to bottom of the tree). I would think O(n) would be true if insertion is closer to root.Beating

© 2022 - 2024 — McMap. All rights reserved.