worst case in MAX-HEAPIFY: "the worst case occurs when the bottom level of the tree is exactly half full"
Asked Answered
I

2

19

In CLRS, third Edition, on page 155, it is given that in MAX-HEAPIFY,

"the worst case occurs when the bottom level of the tree is exactly half full"  

I guess the reason is that in this case, Max-Heapify has to "float down" through the left subtree.
But the thing I couldn't get is "why half full" ?
Max-Heapify can also float down if left subtree has only one leaf. So why not consider this as the worst case ?

Impede answered 28/7, 2011 at 13:12 Comment(0)
B
13

Read the entire context:

The children's subtrees each have size at most 2n/3 - the worst case occurs when the last row of the tree is exactly half full

Since the running time T(n) is analysed by the number of elements in the tree (n), and the recursion steps into one of the subtrees, we need to find an upper bound on the number of nodes in a subtree, relative to n, and that will yield that T(n) = T(max num. nodes in subtree) + O(1)

The worst case of number of nodes in a subtree is when the final row is as full as possible on one side, and as empty as possible on the other. This is called half full. And the left subtree size will be bounded by 2n/3.

If you're proposing a case with only a few nodes, then that's irrelevant, since all base cases can be considered O(1) and ignored.

Bergmann answered 28/7, 2011 at 13:29 Comment(5)
I'm learning about heaps and my brain almost exploded thinking why the answer was not n, as I thought the maximum nodes would be n if one side of the tree was empty. So I was thinking that n should've been the upper bound of number of nodes. If anyone else struggles with the same question, a heap is an almost complete binary tree. So any other level than the last level should have to be full.Vaca
Because we're interested in the recursion T(n) = T(s(n)) + O(1) we need to find the worst case for s(n) = subtree size as a function of n. It would be incorrect say that we're "maximizing the size of the subtree" (I've seen this in a couple other answers related to this question) - we are really maximizing the ratio L/R where L and R are the size of the left and right subtrees respectively.Victuals
The worst case of number of nodes in a subtree is when the final row is as full as possible on one side, and as empty as possible on the other. But why? I too have the exact doubt as OP, Max-Heapify can also float down if left subtree has only one leaf. So why not consider this as the worst case ? I am sorry it's not clear to me. A little clarification would be great help, if possible.Pursuit
@Pursuit because only single leaf does not guarantee that does it float down to that particular leaf, so for safe side and in worst case left subtree should be full at leaves compare to one level less in right subtree.Crush
I think it all boils down to the question what fraction of the total number of nodes can a child node have. In the case of a complete binary heap / tree, there are equal number of nodes in both left and right subtrees, let the number be k. Hence total number of nodes is 1 + k + k = 2k + 1. Hence the fraction of nodes is k/(2k + 1) which converges to 1/2 when k -> infinity. This fraction is smaller than 2/3. Hence the worst case is not in the case of complete binary heap but it happens in the case of half filled binary heap.Nickolai
C
10

Already there's an accepted answer but this answer is for those people who are still a bit confused (as I was), or something still doesn't click. So here's a little bit longer and more detailed explanation.

Though it might sound redundant, we have to be very clear about the exact definitions because through our attention to the details... chances are when you do that proving things becomes much easier.

From CLRS (section 6.1), a Binary Heap data structure is an array object that can be viewed as a nearly complete binary tree

From Wikipedia, In a complete binary tree, every level (except possibly the last level) is completely filled, and all the nodes in the last level are as far left as possible.

Again, from Wikipedia, A balanced binary tree is a binary tree structure in which the left and right sub-trees of every node differ in height by no more than 1.

Now that we are armed, let's dive in.

So, in comparison to the root, the height of the left and right sub-tree can differ by 1 at most.

Let's consider a tree T and let the height of the left sub-tree = h+1 and the height of the right sub-tree = h

What can be the worst-case in MAX_HEAPIFY? The worst-case happens when we end up doing maximum number of comparisons and swaps while trying to maintain the heap property.

When the MAX_HEAPIFY algorithm runs and if it recursively goes through the longest path then we can consider a possible worst-case because it will end up doing the maximum number of comparisons and swaps in the longest path.

Well, it seems all of the longest paths happen to be in the left sub-tree (as its height is h+1). But someone might as well ask: Why not the right sub-tree? Remember the above definition, all the nodes in the last level have to be as far left as possible.

Now because we have to cover every possibility that can lead to a worst-case, we need to get more number of longer paths, if any exist, and because of that, we ought to make the left sub-tree FULL (But Why? So that we can get more paths to choose from and opt for the path that gives the worst-case time among all).

Since the left subtree has a height h+1, it will have 2^(h+1) no. of leaf nodes, and, therefore, 2^(h+1) number of paths from the root. This is the maximum number of possible paths in a tree T of h+1 height.

Note: Please hold on to it if you are still reading, maybe just for the sake of crystal clarity.

Here's the image of the tree structure in the worst-case situation.

In the above image, as you can see, consider that the left (in yellow) sub-tree and the right (in pink) sub-tree each has x nodes. The pink portion is a complete right sub-tree and the yellow portion is the left sub-tree excluding the last level.

Notice that both the left (yellow) and the right (pink) sub-trees have a height of h.

Now, from the start, we have considered the left subtree to be of height h+1 as a whole (i.e. including the yellow portion and the last level).

Now, if I may ask, how many nodes do we have to add in the last level i.e. below the yellow portion to make the left sub-tree completely FULL?

Well, the bottom-most layer of the yellow portion has ⌈x/2⌉ nodes (i.e. Total number of leaves in a tree/subtree having n nodes = ⌈n/2⌉; for proof visit this link), and now if we add 2 children to each of these nodes or leaves => total x (≈x) nodes have been added (How? ⌈x/2⌉ leaves * 2 ≈ x nodes).

With this addition, we make the left sub-tree of height h+1 (i.e. the yellow portion with height h and the one last level added) FULL, hence meeting the worst-case criteria.

Since the left sub-tree is FULL, the whole Tree is HALF FULL.

Now someone might as well ask: What if we add more nodes, or, specifically, what if we add nodes in the right sub-tree? Well, we don't. And that's because now if we happen to add more nodes, the nodes will be added in the right sub-tree (as the left sub-tree is FULL), which, in turn, will tend to balance out the tree more. Now as the tree is starting to get more balanced, we are tending to move towards the best-case scenario and not the worst-case.

Final question : How many nodes do we have in total?

Total nodes in the tree, n = x (from the yellow portion) + x (from the pink portion) + x (addition of the last level below the yellow portion) = 3x

Can you notice something? As a by-product, the left sub-tree in total contains at most 2x nodes i.e. 2n/3 nodes (bcoz x = n/3).

Curium answered 16/5, 2021 at 10:35 Comment(1)
Best answer. Cleared all the doubts.Lamont

© 2022 - 2024 — McMap. All rights reserved.