Trying to understand max heapify

Asked 28/1, 2016 at 0:59 Answered 11/2, 2021 at 21:24

Solved algorithm sorting heap heapsort max-heap

I tried watching http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-4-heaps-and-heap-sort/ to understand heaps and heapsort but did not find this clear.

I do not understand the function of max-heapify. It seems like a recursive function, but then somehow it's said to run in logarithmic time because of the height of the tree.

To me this makes no sense. In the worst case, won't it have to reverse every single node? I don't see how this can be done without it touching every single node, repeatedly.

Cloudless answered 28/1, 2016 at 0:59 Comment(4)

I use the book CLRS and MAX_HEAPIFY is clearly stated as O(N) time, which is an easy proof. Could you tell which part of the video tells it is O(log N)? – Obscurant 28/1, 2016 at 1:20

@Obscurant Start at 26:06 and watch onward from there – Cloudless 28/1, 2016 at 2:27

@beaker: I had mixed up max_heapify and build_max_heap, my bad. – Obscurant 28/1, 2016 at 5:5

@Obscurant No worries, I mixed up max_heapify and insert in the comments below. ;) – Mosenthal 28/1, 2016 at 15:23

Here's what MAX-HEAPIFY does:

Given a node at index i whose left and right subtrees are max-heaps, MAX-HEAPIFY moves the node at i down the max-heap until it no longer violates the max-heap property (that is, the node is not smaller than its children).

The longest path that a node can take before it is in the proper position is equal to the starting height of the node. Each time the node needs to go down one more level in the tree, the algorithm will choose exactly one branch to take and will never backtrack. If the node being heapified is the root of the max-heap, then the longest path it can take is the height of the tree, or O(log n).

MAX-HEAPIFY moves only one node. If you want to convert an array to a max-heap, you have to ensure that all of the subtrees are max-heaps before moving on to the root. You do this by calling MAX-HEAPIFY on n/2 nodes (leaves always satisfy the max-heap property).

From CLRS:

for i = floor(length(A)/2) downto 1
   do MAX-HEAPIFY(A,i)

Since you call MAX-HEAPIFY O(n) times, building the entire heap is O(n log n).*

_{* As mentioned in the comments, a tighter upper-bound of O(n) can be shown. See Section 6.3 of the 2nd and 3rd editions of CLRS for the analysis. (My 1st edition is packed away, so I wasn't able to verify the section number.)}

Mosenthal answered 28/1, 2016 at 3:32 Comment(6)

Actually building the entire heap is O(n) if you compute the time complexity more carefully. – Fibrosis 11/5, 2020 at 18:51

@ZenoRogue Yes, a tighter upper-bound can be found by taking into consideration the depth at which each node is inserted. You can find the analysis in the Second edition of CLRS in section 6.3, which is reproduced here in section 1.3.2. – Mosenthal 11/5, 2020 at 19:11

@ZenoRogue I've added a note to my answer. Please let me know if I've missed anything. – Mosenthal 11/5, 2020 at 19:30

Looks great! (I have no access to section numbers either.) – Fibrosis 12/5, 2020 at 7:44

So, heapify function heapifies a node; and NOT the whole heap(or array). – Sherysherye 24/11, 2021 at 8:22

@DeepamGupta Exactly. Following the pseudocode in CLRS, MAX-HEAPIFY assumes that the left and right subtrees of the (single) input element A[i] are already max-heaps. The input element A[i] is then moved to its proper position so that A[i] and its subtrees now form a heap. – Mosenthal 24/11, 2021 at 15:27

In the worst case, won't it have to reverse every single node?

You don't have to go through every node. The standard max-heapify algorithm is: (taken from Wikipedia)

Max-Heapify (A, i):
    left ← 2*i  // ← means "assignment"
    right ← 2*i + 1
    largest ← i

    if left ≤ heap_length[A] and A[left] > A[largest] then:
        largest ← left
    if right ≤ heap_length[A] and A[right] > A[largest] then:
        largest ← right

    if largest ≠ i then:
        swap A[i] and A[largest]
    Max-Heapify(A, largest)

You can see that on each recursive call you either stop or continue with the subtree left or right. In the latter case you decrease the tree height with 1. Since the heap tree is balanced by definition you would do at most log(N) steps.

Exact answered 28/1, 2016 at 1:35 Comment(7)

Now I am even more confused, because @Obscurant said that in CLRS, it is shown as O(N) time – Cloudless 28/1, 2016 at 2:28

I give you random list [7, 9, 8, 15, 13, 11, 10, 5, 6, 1, 14, 3, 2, 4, 12], (first level 1 node, second level 2 nodes, third level 4 nodes, fourth level 8 nodes), you're saying you can convert this to a max heap without touching all elements? – Cloudless 28/1, 2016 at 2:32

@Cloudless You don't have to touch all of the elements, only the elements in a single path between the leaf where the element is first added up to the root. As it says in CLRS, another way to think of this is O(h) where h is the height of the tree. – Mosenthal 28/1, 2016 at 3:9

When I call max_heapify on my list, the resulting list doesn't seem to be a max heap – Cloudless 28/1, 2016 at 3:10

Was it a max heap before? Because if the rest of the tree isn't a heap then you can't expect heapifying one node to fix it all. (Note that I stated the direction of the path incorrectly above. The value being heapified moves from the root to a leaf.) – Mosenthal 28/1, 2016 at 3:17

I don't understand then what max heapify is supposed to do, then. Like I have a random list and I want to turn it into a max heap. – Cloudless 28/1, 2016 at 3:18

@Cloudless I've added an answer since it was getting too long to fit in a comment. – Mosenthal 28/1, 2016 at 3:41

Here's an argument for why it's O(N).

Assume it's a full heap, so every non-leaf node has two children. (It still works even if that's not the case, but it's more annoying.)

Put a coin on each node in the tree. Each time we do a swap, we're going to spend one of those coins. (Note that when elements swap in the heap, the coins don't swap with them.) If we run MAX-HEAPIFY, and there's any coins left over, that means we've done fewer swaps than there are nodes in the tree, and thus MAX-HEAPIFY performs O(N) swaps.

Claim: after MAX-HEAPIFY is done running, a heap will always have at least one path from the root to a leaf with coins on every node of the path.

Proof by induction: For a single-node heap, we don't need to do any swaps, so we don't need to spend any coins. Thus, the one node gets to keep its coin, and we have a full path from root to leaf (of length 1) with coin intact.

Now, assume we have a heap with left and right subheaps, and MAX-HEAPIFY has already run on both. By inductive hypothesis, each has at least one path from root to leaf with coins on it, so we have at least two root-to-leaf paths with coins, one for each child. The farthest the root would ever need to go in order to establish the MAX-HEAP property is to swap all the way to the bottom of the tree. Let's say it swaps down into the left subtree, and it swaps all the way to down to the bottom. For each swap, we need to spend the coin, so we spend it from the node that the root swapped to.

In doing this, we spent all the coins on one of the root-to-leaf paths, but remember we originally had at least two! Therefore, we still have a root-to-leaf path complete with coins after MAX-HEAPIFY runs on the whole heap. Therefore, MAX-HEAPIFY spent fewer coins than there are nodes in the tree. Therefore, the number of swaps is O(N). QED.

Damiano answered 11/2, 2021 at 21:24 Comment(0)

Recommended topics

Hot tags