Python heapify() time complexity
Asked Answered
E

2

25
def heapify(A):
    for root in xrange(len(A)//2-1, -1, -1):
        rootVal = A[root]
        child = 2*root+1
        while child < len(A):
            if child+1 < len(A) and A[child] > A[child+1]:
                child += 1
            if rootVal <= A[child]:
                break
            A[child], A[(child-1)//2] = A[(child-1)//2], A[child]
            child = child *2 + 1

This is a similar implementation of python heapq.heapify(). It is said in the doc this function runs in O(n). But it looks like for n/2 elements, it does log(n) operations. Why is it O(n)?

Encratia answered 7/8, 2018 at 21:29 Comment(3)
inside the loop, child = child * 2 + 1 until it gets to len(A)Encratia
Also see #49774737Commonage
I don't understand why @typing suggested the child = child*2 + 1. You move from the current node (root) to the child once you have finished, but if you go to the child's child you are actually jumping a level of a tree, try to heapify this array [2|10|9|5|6]Offset
O
25

It requires more careful analysis, such as you'll find here. The basic insight is that only the root of the heap actually has depth log2(len(a)). Down at the nodes one above a leaf - where half the nodes live - a leaf is hit on the first inner-loop iteration.

"Exact" derivation

Waving hands some, when the algorithm is looking at a node at the root of a subtree with N elements, there are about N/2 elements in each subtree, and then it takes work proportional to log(N) to merge the root and those sub-heaps into a single heap. So the total time T(N) required is about

T(N) = 2*T(N/2) + O(log(N))

That's an uncommon recurrence. The Akra–Bazzi method can be used to deduce that it's O(N), though.

I think more informative, and certainly more satifsying, is to derive an exact solution from scratch. Toward that end, I'll only talk about complete binary trees: as full as possible on every level. Then there 2**N - 1 elements in total, and all subtrees are also complete binary trees. This sidesteps mounds of pointless details about how to proceed when things aren't exactly balanced.

When we're looking at a subtree with 2**k - 1 elements, its two subtrees have exactly 2**(k-1) - 1 elements each, and there are k levels. For example, for a tree with 7 elements, there's 1 element at the root, 2 elements on the second level, and 4 on the third. After the subtrees are heapified, the root has to moved into place, moving it down 0, 1, or 2 levels. This requires doing comparisons between levels 0 and 1, and possibly also between levels 1 and 2 (if the root needs to move down), but no more that that: the work required is proportional to k-1. In all, then,

T(2**k - 1) = 2 * T(2**(k-1) - 1) + (k - 1)*C

for some constant C bounding the worst case for comparing elements at a pair of adjacent levels.

What about T(1)? That's free! A tree with only 1 element is a already a heap - there's nothing to do.

T(1) = 0

One level above those leaves, trees have 3 elements. It costs (no more than) C to move the smallest (for a min-heap; largest for a max-heap) to the top.

T(3) = C

One level above that trees have 7 elements. It costs T(3) to heapify each of the subtrees, and then no more than 2*C to move the root into place:

T(7) = 2*C + 2*C = 4*C

Continuing in the same way:

T(15) = 2* 4*C + 3*C = 11*C
T(31) = 2*11*C + 4*C = 26*C
T(63) = 2*26*C + 5*C = 57*C
...
T(2**k - 1) = (2**k - k - 1)*C

where the last line is a guess at the general form. You can verify that "it works" for all the specific lines before it, and then it's straightforward to prove it by induction.

So, where N = 2**k - 1,

T(N) = (N - log2(N+1)) * C

which shows that T(N) is bounded above by C*N, so is certainly O(N).

Overcritical answered 7/8, 2018 at 21:35 Comment(3)
I do not understand. This does not explain why the heapify() takes O(log(N)). TH(n) = c, if n=1 worst case when the largest if never root: TH(n) = c + ? * TH( ? ) how to write the recursive expression?Reagent
Already gave a link to a detailed analysis. It doesn't use a recursive formulation, and there's no need to. And the claim isn't that heapify takes O(log(N)) time, but that it takes O(N) time. Consider opening a different issue if you have a focused question.Overcritical
@user3742309, see edit for a full derivation from scratch.Overcritical
S
0

For more mathy folks: Consider a complete binary tree with L levels, labeled from 0 (root) to L-1. Such a tree has N=2^L-1 nodes. Conversely, a complete binary tree with N nodes has L=log2(N-1) levels.

For any given node at a level l, the bubble/heapify down operation makes L-1-l comparisons. For the total number of operations, we thus need to compute the sum:

enter image description here

which is equal to

enter image description here

Note: strictly speaking, the sum goes from 0 to L-2, as the leaves are not to be operated on. However, L-(L-1)-1 is zero, so we can leave the sum upper limit to be L-1

Salmons answered 18/2 at 21:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.