It requires more careful analysis, such as you'll find here. The basic insight is that only the root of the heap actually has depth log2(len(a))
. Down at the nodes one above a leaf - where half the nodes live - a leaf is hit on the first inner-loop iteration.
"Exact" derivation
Waving hands some, when the algorithm is looking at a node at the root of a subtree with N
elements, there are about N/2
elements in each subtree, and then it takes work proportional to log(N)
to merge the root and those sub-heaps into a single heap. So the total time T(N)
required is about
T(N) = 2*T(N/2) + O(log(N))
That's an uncommon recurrence. The Akra–Bazzi method can be used to deduce that it's O(N)
, though.
I think more informative, and certainly more satifsying, is to derive an exact solution from scratch. Toward that end, I'll only talk about complete binary trees: as full as possible on every level. Then there 2**N - 1
elements in total, and all subtrees are also complete binary trees. This sidesteps mounds of pointless details about how to proceed when things aren't exactly balanced.
When we're looking at a subtree with 2**k - 1
elements, its two subtrees have exactly 2**(k-1) - 1
elements each, and there are k
levels. For example, for a tree with 7 elements, there's 1 element at the root, 2 elements on the second level, and 4 on the third. After the subtrees are heapified, the root has to moved into place, moving it down 0, 1, or 2 levels. This requires doing comparisons between levels 0 and 1, and possibly also between levels 1 and 2 (if the root needs to move down), but no more that that: the work required is proportional to k-1
. In all, then,
T(2**k - 1) = 2 * T(2**(k-1) - 1) + (k - 1)*C
for some constant C
bounding the worst case for comparing elements at a pair of adjacent levels.
What about T(1)
? That's free! A tree with only 1 element is a already a heap - there's nothing to do.
T(1) = 0
One level above those leaves, trees have 3 elements. It costs (no more than) C
to move the smallest (for a min-heap; largest for a max-heap) to the top.
T(3) = C
One level above that trees have 7 elements. It costs T(3)
to heapify each of the subtrees, and then no more than 2*C
to move the root into place:
T(7) = 2*C + 2*C = 4*C
Continuing in the same way:
T(15) = 2* 4*C + 3*C = 11*C
T(31) = 2*11*C + 4*C = 26*C
T(63) = 2*26*C + 5*C = 57*C
...
T(2**k - 1) = (2**k - k - 1)*C
where the last line is a guess at the general form. You can verify that "it works" for all the specific lines before it, and then it's straightforward to prove it by induction.
So, where N = 2**k - 1
,
T(N) = (N - log2(N+1)) * C
which shows that T(N)
is bounded above by C*N
, so is certainly O(N)
.