What's the time complexity of functions in heapq library
Asked Answered
B

5

78

My question is from the solution in leetcode below, I can't understand why it is O(k+(n-k)log(k)).

Supplement: Maybe the complexity isn't that, in fact I don't know the time complexity of heappush() and heappop()

# O(k+(n-k)lgk) time, min-heap
def findKthLargest(self, nums, k):
    heap = []
    for num in nums:
        heapq.heappush(heap, num)
    for _ in xrange(len(nums)-k):
        heapq.heappop(heap)
    return heapq.heappop(heap)
Baillie answered 6/8, 2016 at 16:6 Comment(5)
@ValentinLorentz, I believe lgx generally means log(x).Precious
We need more context. Do you understand the time complexity of heappush() and heappop()? Do you understand that the loop in the 4th and 5th lines is inefficient, and indeed the entire routine is less efficient than necessary?Miramirabeau
It isn't. There is a reasonably straightforward way to use heaps giving the stated O() complexity, but this specific code isn't close.Stringed
@RoryDaulton well, I don't know the time complexity of heappush() and heappop(). I can't find them anywhere...Baillie
Then you need to study the concepts priority queue and heap and perhaps binary tree before answering this question. If the size of the heap named heap is n then the complexity of either heappush() or heappop() is O(log(n)). This is because the heap is conceptually a complete binary tree which has about log(n) levels.Miramirabeau
R
92

heapq is a binary heap, with O(log n) push and O(log n) pop. See the heapq source code.

The algorithm you show takes O(n log n) to push all the items onto the heap, and then O((n-k) log n) to find the kth largest element. So the complexity would be O(n log n). It also requires O(n) extra space.

You can do this in O(n log k), using O(k) extra space by modifying the algorithm slightly. I'm not a Python programmer, so you'll have to translate the pseudocode:

# create a new min-heap
# push the first k nums onto the heap
for the rest of the nums:
    if num > heap.peek()
        heap.pop()
        heap.push(num)

# at this point, the k largest items are on the heap.
# The kth largest is the root:

return heap.pop()

The key here is that the heap contains just the largest items seen so far. If an item is smaller than the kth largest seen so far, it's never put onto the heap. The worst case is O(n log k).

Actually, heapq has a heapreplace method, so you could replace this:

    if num > heap.peek()
        heap.pop()
        heap.push(num)

with

    if num > heap.peek()
        heap.replace(num)

Also, an alternative to pushing the first k items is to create a list of the first k items and call heapify. A more optimized (but still O(n log k)) algorithm is:

# create array of first `k` items
heap = heapify(array)
for remaining nums
    if (num > heap.peek())
        heap.replace(num)
return heap.pop()

You could also call heapify on the entire array, then pop the first n-k items, and then take the top:

heapify(nums)
for i = 0 to n-k
    heapq.heappop(nums)
return heapq.heappop(nums)

That's simpler. Not sure if it's faster than my previous suggestion, but it modifies the original array. The complexity is O(n) to build the heap, then O((n-k) log n) for the pops. So it's be O((n-k) log n). Worst case O(n log n).

Ruberta answered 8/8, 2016 at 15:29 Comment(7)
I just came back here because I remembered posting something wrong. I ran a test on this, and heapify was faster (needing 80% of the time on the same input). But using direct index into sorted(thelist) was considerably faster than either.Titanic
@KennyOstrom: No surprise that the last option is fastest. If the OP can modify the original array, then that's the one he probably should use.Ruberta
For all measurements, I used versions that made a separate copy of the array. For example heap=nums[:]; heapify(heap)Titanic
Why is the last solution not O(n + (n-k) log n)? Why not include the O(n) from the heapify?Tercel
@user2361174: because the '(n-k)log n' term will dwarf the O(n) term in the general case.Ruberta
Based on this line the heappop complexity doesn't seem to be O(log n) github.com/python/cpython/blob/…Clearness
@Clearness It's difficult to say from the sparse data (three runs on a heap of 1000 items), but it appears that the improvement to heappop is on the order of 42%. So the complexity there would be O(0.58 * log n)). That's still considered O(log n). You'd have to more exhaustive tests with much larger n to see if that 0.58 constant holds.Ruberta
N
15

heapify() actually takes linear time because the approach is different than calling heapq.push() N times.

heapq.push()/heapq.pop() takes log n time because it adjust all the nodes at a given hight/level.

when you pass an array in heapify() it makes sure that the left and right children of the node are already maintaining the heap property whether it is a min heap or max heap.

you can see this video: https://www.youtube.com/watch?v=HqPJF2L5h9U

https://www.youtube.com/watch?v=B7hVxCmfPtM

Hope this would help.

Numismatics answered 12/9, 2020 at 8:54 Comment(2)
please avoid posting links on and provide solution code snippets if possible, consider adding video links as last choice, consider for those as well who are visually impairedCholesterol
when you pass an array in heapify() it makes sure that the left and right children of the node are already maintaining the heap property I think it is wrong statement. In python heapify() will create heap from any list.Pompei
B
8

Summarize from @Shivam purbia 's post:

  1. Using heaps.heapify() can reduce both time and space complexity because heaps.heapify() is an in-place heapify and costs linear time to run it.
  2. both heapq.heappush() and heapq.heappop() cost O(logN) time complexity

Final code will be like this ...

import heapq

def findKthLargest(self, nums, k):
    heapq.heapify(nums)            # in-place heapify -> cost O(N) time
    
    for _ in range(len(nums)-k):   # run (N-k) times
        heapq.heappop(heap)        # cost O(logN) time
    return heapq.heappop(heap)     
  • Total time complexity is O((N - k)logN)
  • Total space complexity is O(1)
Bray answered 29/4, 2021 at 7:41 Comment(2)
It helped me a lot!Suzettesuzi
Small correction, as per the docs, it must be heapq.heapify(nums) instead of heaps.heapify(nums).Lamebrain
U
0

for just creating and heapify the elements, it's O(nlogn). But for just heapify the elements, it's o(n).

Ugrian answered 7/9, 2021 at 19:20 Comment(0)
V
0

In the question, popout the smallest from heap is not the best answer

lets say ur input has 1 million items, then u need to pop 1m - k time

instead, in python , we can use maxheap, you will only require to have O(k) on pop, instead O(n-k), when n is super large

def findKthLargest(self, nums: List[int], k: int) -> int:
        _heapify_max(nums)
        while k > 0:
            val = _heappop_max(nums)
            k-=1
            if k == 0 :
                return val 
Volteface answered 1/6, 2022 at 14:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.