Python: Find running median with Max-Heap and Min-Heap
Asked Answered
C

1

8

I'm trying to return the running median for a series of streaming numbers. To do that I use a max-heap (which stores the values on the lower half of the series) and a min-heap (which stores the values on the higher half of the series).

In particular I'm using the Python (2.0) built-in min-heap data structure from the heapq module (https://docs.python.org/2/library/heapq.html). To build the max-heap instead I simply use the negative of the numbers I need to push into my heap.

My Python code is the following:

import heapq

maxh = []
minh = []
vals=[1,2,3,4,5,6,7,8,9,10]
for val in vals:

    # Initialize the data-structure and insert/push the 1st streaming value
    if not maxh and not minh:
        heapq.heappush(maxh,-val)
        print float(val)
    elif maxh:

        # Insert/push the other streaming values
        if val>-maxh[0]:
            heapq.heappush(minh,val)
        elif val<-maxh[0]:
            heapq.heappush(maxh,-val)

        # Calculate the median
        if len(maxh)==len(minh):
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+1:
            print float(-maxh[0])
        elif len(minh)==len(maxh)+1:
            print float(minh[0])

        # If min-heap and max-heap grow unbalanced we rebalance them by
        # removing/popping one element from a heap and inserting/pushing
        # it into the other heap, then we calculate the median
        elif len(minh)==len(maxh)+2:
            heapq.heappush(maxh,-heapq.heappop(minh))
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+2:
            heapq.heappush(minh,-heapq.heappop(maxh))
            print float(-maxh[0]+minh[0])/2

Below is the full list of test cases I've built to check my code:

vals=[1,2,3,4,5,6,7,8,9,10] # positive numbers, increasing series
vals=[10,9,8,7,6,5,4,3,2,1] # positive numbers, decreasing series
vals=[10,9,11,8,12,7,13,6,14,5] # positive numbers, jumping series (keeping
                                # heaps balanced)

vals=[-10,-9,-8,-7,-6,-5,-4,-3,-2,-1] # negative numbers, increasing series
vals=[-1,-2,-3,-4,-5,-6,-7,-8,-9,-10] # negative numbers, decreasing series
vals=[-10,-9,-11,-8,-12,-7,-13,-6,-14,-5] # negative numbers
                                          # jumping series (keeping heaps
                                          # balanced)

vals=[-5,-4,-3,-2,-1,0,1,2,3,4,5] # mixed positive-negative numbers,
                                  # increasing series
vals=[5,4,3,2,1,0,-1,-2,-3,-4,-5] # mixed positive-negative numbers,
                                  # decreasing series
vals=[0,-1,1,-2,2,-3,3,-4,4,-5,5] # mixed positive-negative numbers,
                                  # jumping series (keeping heaps balanced)

My code seems ok to me but I cannot pass 4 out of 10 test cases with an online judge (https://www.hackerrank.com/challenges/ctci-find-the-running-median/problem).

Do you have any hint?

Christian answered 3/8, 2017 at 10:54 Comment(7)
That problem states that the first number tells how many values will be input. Did you account for thst?Thrave
No, I didn't account for that because I thought it was irrelevant for sake of a solution. How do you think I could use such information?Christian
Your code can fail if there are duplicate values. If the next item is equal to the value that's currently at the top of maxh.Batey
You probably don't need that information to find a solution, but it could potentially be a cause of failure if you don't have direct control over your input when submitting your code for judgin. But it sounds like @JimMischel has found something more important for you to worry about.Thrave
Somehow, I truncated my comment. If the next item is equal to the value that's currently at the top of maxh, it won't be added to either heap. The test case [1,1,2] should reveal the error.Batey
Great! Thanks you @JimMischel for the hint, I feel a bit of a dumb for not thinking about it :) . I've added the block elif val==-maxh[0]: heapq.heappush(minh,val) and now I've passed all the test cases!Christian
Pardon, I've substituted the block if val>-maxh[0]: heapq.heappush(minh,val) with if val>=-maxh[0]: heapq.heappush(minh,val)Christian
B
8

The problem is here:

    # Insert/push the other streaming values
    if val>-maxh[0]:
        heapq.heappush(minh,val)
    elif val<-maxh[0]:
        heapq.heappush(maxh,-val)

If val == maxh[0], then the item is never pushed onto either heap. You should be able to reveal the error with the test case [1,1,2].

A simple fix would be:

    # Insert/push the other streaming values
    if val >= -maxh[0]:
        heapq.heappush(minh,val)
    else
        heapq.heappush(maxh,-val)
Batey answered 3/8, 2017 at 15:46 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.