Most efficient way to filter a long time series Python

I have a large time series, say 1e10, that results from recording neural activity, i.e. voltages. Before doing further analysis I want to band pass filter that data between 300 Hz and 7000 Hz. Below, I post the code for the Butterworth filter I designed. How do I make this filter faster? It takes too long to run.

Update: Sample Data

Here is a link to data such as would be in each row of data.

As to formatting, each row represents a different recording source and each column represents a point in time. The data are sampled at 20, 000 Hz.

 def butter_bandpass(lowcut,highcut,fs,order=8):
     nyq = 0.5*fs
     low = lowcut/nyq
     high = highcut/nyq

     b,a = butter(order, [low, high], btype='band')
     return b,a

 def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
     b,a = butter_bandpass(lowcut,highcut,fs,order=order)
     return lfilter(b,a,data) 

 print 'Band-pass filtering data'
 data = map(lambda channel: butter_bandpass_filter(channel,300,7000,20000),data)

Update

Like most NumPy, SciPy functions lfilter can take a multidimensional input and so map creates unnecessary overhead. That is, one can rewrite

 data = map(lambda channel:butter_bandpass_filter(channel,300,7000,20000),data)

 data = butter_bandpass_filter(data,300,7000,20000)

By default lfilter operates on the last non-singleton axis. For a 2D matrix, this means the function is applied to each row, which is exactly what I need. Sadly, it's still too slow though.

First, your data sample is in a proprietary format, am I right? Even using the biosig toolbox for Python this format cannot be read. Maybe I'm wrong, but I didn't succeed to read it.

Thus, I'll base my answer on artificial data, generated from a Rössler-oscillator. It is a chaotic, 3d-oscillator, often used in the field of nonlinear timeseries analysis.

import numpy as np
from scipy.signal import butter, lfilter

##############################################################
# For generating sample-data
##############################################################
from scipy.integrate import odeint

def roessler_ode(y,t,omega=1,a=0.165,b=0.2,c=10):
    dy = np.zeros((3))
    dy[0] = -1.0*(omega*y[1] + y[2]) #+ e1*(y[3]-y[0])
    dy[1] = omega * y[0] + a * y[1]
    dy[2] = b + y[2] * (y[0] - c)
    return dy

class Roessler(object):
    """A single coupled Roessler oscillators"""
    def __init__(self, y=None, omega=1.0, a=0.165,b=0.2,c=10):
        self.omega = omega
        self.a = a
        self.b = b
        self.c = c
        if y==None:
            self.y = np.random.random((3))+0.5
        else:
            self.y = y

    def ode(self,y,t):
        dy = roessler_ode(y[:],t,self.omega,self.a,self.b,self.c)
        return dy

    def integrate(self,ts):
        rv = odeint(self.ode,self.y,ts)
        self.y = rv[-1,:]
        return rv
###############################################################

Now come your function definitions:

def butter_bandpass(lowcut,highcut,fs,order=8):
    nyq = 0.5*fs
    low = lowcut/nyq
    high = highcut/nyq

    b,a = butter(order, [low, high], btype='band')
    return b,a

def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
    b,a = butter_bandpass(lowcut,highcut,fs,order=order)
    return lfilter(b,a,data)

I left them unchanged. I generate some data with my oscillator, but I only take the 3rd component of it. I add some gaussian noise in order to have something to filter out.

# generate sample data
data = Roessler().integrate(np.arange(0,1000,0.1))[:,2]
data += np.random.normal(size=data.shape)

Now let's come to the speed question. I use the timeit-module to check execution times. These statements execute the filtering 100 times, and measure the overall time. I do this measurement for order 2 and order 8 (yes, you want sharper spectral edges, I know, but wait)

# time execution
from timeit import timeit
time_order8 = timeit("butter_bandpass_filter(data,300,2000,20000,8)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
time_order2 = timeit("butter_bandpass_filter(data,300,2000,20000,2)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
print "For order 8: %.2f seconds" % time_order8
print "For order 2: %.2f seconds" % time_order2

The output of these two print statements is:

For order 8: 11.70 seconds
For order 2: 0.54 seconds

This makes a factor of 20! Using order 8 for a butterworth filter is definitely not a good idea. I cannot think of any situation where this would make sense. To prove the other problems that arise when using such a filter, let's look at the effect of those filters. We apply bandpass filtering to our data, once with order 8 and once with order 2:

data_bp8 = butter_bandpass_filter(data,300,2000,20000,8)
data_bp2 = butter_bandpass_filter(data,300,2000,20000,2)

Now let's do some plots. First, simple lines (I didn't care about the x-axis)

# plot signals
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(data, label="raw")
plt.plot(data_bp8, label="order 8")
plt.plot(data_bp2, label="order 2")
plt.legend()

This gives us:

Signals

Oh, what happened to the green line? Weird, isn't it? The reason is that butterworth filters of order 8 become a rather instable thing. Ever heard of resonance disaster / catastrophe? This is what it looks like.

The power spectral densities of these signals can be plotted like:

# plot power spectral densities
plt.figure(2)
plt.psd(data, Fs=200000, label="raw")
plt.psd(data_bp8, Fs=20000, label="order 8")
plt.psd(data_bp2, Fs=20000, label="order 2")
plt.legend()

plt.show()

PSD

Here, you see that the green line has sharper edges, but at what price? Artificial peak at approx. 300 Hz. The signal is completely distorted.

So what shall you do?

Never use butterworth filter of order 8
Use lower order, if it is sufficient.
If not, create some FIR filter with the Parks-McGlellan or Remez-Exchange-Algorithms. There is scipy.signal.remez, for example.

Another hint: if you care about the phase of your signal, you should definitely filter forwards and backwards in time. lfilter does shift the phases otherwise. An implementation of such an algorithm, commonly refered to as filtfilt, can be found at my github repository.

One more programming hint:

If you have the situation of pass-through parameters (four parameters of butter_bandpass_filter are only passed through to butter_bandpass, you could make use of *args and **kwargs.

def butter_bandpass(lowcut,highcut,fs,order=8):
    nyq = 0.5*fs
    low = lowcut/nyq
    high = highcut/nyq

    b,a = butter(order, [low, high], btype='band')
    return b,a

def butter_bandpass_filter(data, *args, **kwargs):
    b,a = butter_bandpass(*args, **kwargs)
    return lfilter(b,a,data)

This reduces code redundancy and will make your code less error-prone.

Finally, here is the complete script, for easy copy-pasting to try it out.

import numpy as np
from scipy.signal import butter, lfilter

##############################################################
# For generating sample-data
##############################################################
from scipy.integrate import odeint

def roessler_ode(y,t,omega=1,a=0.165,b=0.2,c=10):
    dy = np.zeros((3))
    dy[0] = -1.0*(omega*y[1] + y[2]) #+ e1*(y[3]-y[0])
    dy[1] = omega * y[0] + a * y[1]
    dy[2] = b + y[2] * (y[0] - c)
    return dy

class Roessler(object):
    """A single coupled Roessler oscillators"""
    def __init__(self, y=None, omega=1.0, a=0.165,b=0.2,c=10):
        self.omega = omega
        self.a = a
        self.b = b
        self.c = c
        if y==None:
            self.y = np.random.random((3))+0.5
        else:
            self.y = y

    def ode(self,y,t):
        dy = roessler_ode(y[:],t,self.omega,self.a,self.b,self.c)
        return dy

    def integrate(self,ts):
        rv = odeint(self.ode,self.y,ts)
        self.y = rv[-1,:]
        return rv
###############################################################


def butter_bandpass(lowcut,highcut,fs,order=8):
    nyq = 0.5*fs
    low = lowcut/nyq
    high = highcut/nyq

    b,a = butter(order, [low, high], btype='band')
    return b,a

def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
    b,a = butter_bandpass(lowcut,highcut,fs,order=order)
    return lfilter(b,a,data) 

# generate sample data
data = Roessler().integrate(np.arange(0,1000,0.1))[:,2]
data += np.random.normal(size=data.shape)

# time execution
from timeit import timeit
time_order8 = timeit("butter_bandpass_filter(data,300,2000,20000,8)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
time_order2 = timeit("butter_bandpass_filter(data,300,2000,20000,2)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
print "For order 8: %.2f seconds" % time_order8
print "For order 2: %.2f seconds" % time_order2

data_bp8 = butter_bandpass_filter(data,300,2000,20000,8)
data_bp2 = butter_bandpass_filter(data,300,2000,20000,2)

# plot signals
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(data, label="raw")
plt.plot(data_bp8, label="order 8")
plt.plot(data_bp2, label="order 2")
plt.legend()

# plot power spectral densities
plt.figure(2)
plt.psd(data, Fs=200000, label="raw")
plt.psd(data_bp8, Fs=20000, label="order 8")
plt.psd(data_bp2, Fs=20000, label="order 2")
plt.legend()

plt.show()

Recommended topics

Hot tags