I borrowed some code trying to implement a function to calculate the running median for a ton of data. The current one is too slow for me (The tricky part is that I need to exclude all zeros from the running box). Below is the code:
from itertools import islice
from collections import deque
from bisect import bisect_left,insort
def median(s):
sp = [nz for nz in s if nz!=0]
print sp
Mnow = len(sp)
if Mnow == 0:
return 0
else:
return np.median(sp)
def RunningMedian(seq, M):
seq = iter(seq)
s = []
# Set up list s (to be sorted) and load deque with first window of seq
s = [item for item in islice(seq,M)]
d = deque(s)
# Sort it in increasing order and extract the median ("center" of the sorted window)
s.sort()
medians = [median(s)]
for item in seq:
old = d.popleft() # pop oldest from left
d.append(item) # push newest in from right
del s[bisect_left(s, old)] # locate insertion point and then remove old
insort(s, item) # insert newest such that new sort is not required
medians.append(median(s))
return medians
It works well, the only drawback is that it is too slow. Any one could help me to improve the code to be more efficient?
After I explored all the possibilities, the following simple code can calculate comparably efficiently:
def RunningMedian(x,N):
idx = np.arange(N) + np.arange(len(x)-N+1)[:,None]
b = [row[row>0] for row in x[idx]]
return np.array(map(np.median,b))
#return np.array([np.median(c) for c in b]) # This also works