heapq
is a binary heap, with O(log n) push
and O(log n) pop
. See the heapq source code.
The algorithm you show takes O(n log n) to push all the items onto the heap, and then O((n-k) log n) to find the kth largest element. So the complexity would be O(n log n). It also requires O(n) extra space.
You can do this in O(n log k), using O(k) extra space by modifying the algorithm slightly. I'm not a Python programmer, so you'll have to translate the pseudocode:
# create a new min-heap
# push the first k nums onto the heap
for the rest of the nums:
if num > heap.peek()
heap.pop()
heap.push(num)
# at this point, the k largest items are on the heap.
# The kth largest is the root:
return heap.pop()
The key here is that the heap contains just the largest items seen so far. If an item is smaller than the kth largest seen so far, it's never put onto the heap. The worst case is O(n log k).
Actually, heapq
has a heapreplace
method, so you could replace this:
if num > heap.peek()
heap.pop()
heap.push(num)
with
if num > heap.peek()
heap.replace(num)
Also, an alternative to pushing the first k
items is to create a list of the first k
items and call heapify
. A more optimized (but still O(n log k)) algorithm is:
# create array of first `k` items
heap = heapify(array)
for remaining nums
if (num > heap.peek())
heap.replace(num)
return heap.pop()
You could also call heapify
on the entire array, then pop the first n-k
items, and then take the top:
heapify(nums)
for i = 0 to n-k
heapq.heappop(nums)
return heapq.heappop(nums)
That's simpler. Not sure if it's faster than my previous suggestion, but it modifies the original array. The complexity is O(n) to build the heap, then O((n-k) log n) for the pops. So it's be O((n-k) log n). Worst case O(n log n).
lgx
generally meanslog(x)
. – Preciousheappush()
andheappop()
? Do you understand that the loop in the 4th and 5th lines is inefficient, and indeed the entire routine is less efficient than necessary? – MiramirabeauO()
complexity, but this specific code isn't close. – Stringedpriority queue
andheap
and perhapsbinary tree
before answering this question. If the size of the heap namedheap
is n then the complexity of eitherheappush()
orheappop()
is O(log(n)). This is because the heap is conceptually a complete binary tree which has about log(n) levels. – Miramirabeau