Most efficient way to filter a long time series Python
Asked Answered
C

1

9

I have a large time series, say 1e10, that results from recording neural activity, i.e. voltages. Before doing further analysis I want to band pass filter that data between 300 Hz and 7000 Hz. Below, I post the code for the Butterworth filter I designed. How do I make this filter faster? It takes too long to run.

Update: Sample Data

Here is a link to data such as would be in each row of data.

As to formatting, each row represents a different recording source and each column represents a point in time. The data are sampled at 20, 000 Hz.

 def butter_bandpass(lowcut,highcut,fs,order=8):
     nyq = 0.5*fs
     low = lowcut/nyq
     high = highcut/nyq

     b,a = butter(order, [low, high], btype='band')
     return b,a

 def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
     b,a = butter_bandpass(lowcut,highcut,fs,order=order)
     return lfilter(b,a,data) 

 print 'Band-pass filtering data'
 data = map(lambda channel: butter_bandpass_filter(channel,300,7000,20000),data)

Update

Like most NumPy, SciPy functions lfilter can take a multidimensional input and so map creates unnecessary overhead. That is, one can rewrite

 data = map(lambda channel:butter_bandpass_filter(channel,300,7000,20000),data)

as

 data = butter_bandpass_filter(data,300,7000,20000)

By default lfilter operates on the last non-singleton axis. For a 2D matrix, this means the function is applied to each row, which is exactly what I need. Sadly, it's still too slow though.

Conjugate answered 4/2, 2013 at 20:48 Comment(11)
I'd run cprofile on it, and see where things are getting hung up. Then, focus there: docs.python.org/2/library/profile.htmlFry
An example of data would be helpful. This is interesting dsp.stackexchange.com/questions/2864/…Uncalledfor
As a general comment, you better off using packages like numpy to perform the calculations than a pure python implementation. numpy uses C and Fortran compiled code to implement functions and that will prove to be much much faster (factor of 4 or more).Ametropia
@Ametropia I think he is already using scipy.signal.lfilter to do the meat of the job.Accrue
yeah, you're right. I missed that :(Ametropia
It hangs on lfilter, which is a scipy function.Conjugate
@Uncalledfor I added some sample dataConjugate
Usually I'd go for an FIR filter when doing offline DSP. Does it have to be Butterworth? Butterworth is often chosen partly because it is easy to implement in the analogue domain... whereas FIR can be done in parallel, and it's much easier to tune the frequency / phase response when designing the filter.Salvador
It doesn't have to be Butterworth. The most important thing is for is to remove low frequency (> 200 Hz) oscillations.Conjugate
Similar to the bandpass FIR seen here? mpastell.com/2010/01/18/fir-with-scipyConjugate
How do you read the .ddt-format of your sample data? Seems like even the python-version of biosig-toolbox doesn'T support it. Please provide another format, e.g., load it as numpy array and then dump it to a file, or pickle it or whatever.Skirr
S
12

First, your data sample is in a proprietary format, am I right? Even using the biosig toolbox for Python this format cannot be read. Maybe I'm wrong, but I didn't succeed to read it.

Thus, I'll base my answer on artificial data, generated from a Rössler-oscillator. It is a chaotic, 3d-oscillator, often used in the field of nonlinear timeseries analysis.

import numpy as np
from scipy.signal import butter, lfilter

##############################################################
# For generating sample-data
##############################################################
from scipy.integrate import odeint

def roessler_ode(y,t,omega=1,a=0.165,b=0.2,c=10):
    dy = np.zeros((3))
    dy[0] = -1.0*(omega*y[1] + y[2]) #+ e1*(y[3]-y[0])
    dy[1] = omega * y[0] + a * y[1]
    dy[2] = b + y[2] * (y[0] - c)
    return dy

class Roessler(object):
    """A single coupled Roessler oscillators"""
    def __init__(self, y=None, omega=1.0, a=0.165,b=0.2,c=10):
        self.omega = omega
        self.a = a
        self.b = b
        self.c = c
        if y==None:
            self.y = np.random.random((3))+0.5
        else:
            self.y = y

    def ode(self,y,t):
        dy = roessler_ode(y[:],t,self.omega,self.a,self.b,self.c)
        return dy

    def integrate(self,ts):
        rv = odeint(self.ode,self.y,ts)
        self.y = rv[-1,:]
        return rv
###############################################################

Now come your function definitions:

def butter_bandpass(lowcut,highcut,fs,order=8):
    nyq = 0.5*fs
    low = lowcut/nyq
    high = highcut/nyq

    b,a = butter(order, [low, high], btype='band')
    return b,a

def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
    b,a = butter_bandpass(lowcut,highcut,fs,order=order)
    return lfilter(b,a,data) 

I left them unchanged. I generate some data with my oscillator, but I only take the 3rd component of it. I add some gaussian noise in order to have something to filter out.

# generate sample data
data = Roessler().integrate(np.arange(0,1000,0.1))[:,2]
data += np.random.normal(size=data.shape)

Now let's come to the speed question. I use the timeit-module to check execution times. These statements execute the filtering 100 times, and measure the overall time. I do this measurement for order 2 and order 8 (yes, you want sharper spectral edges, I know, but wait)

# time execution
from timeit import timeit
time_order8 = timeit("butter_bandpass_filter(data,300,2000,20000,8)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
time_order2 = timeit("butter_bandpass_filter(data,300,2000,20000,2)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
print "For order 8: %.2f seconds" % time_order8
print "For order 2: %.2f seconds" % time_order2

The output of these two print statements is:

For order 8: 11.70 seconds
For order 2: 0.54 seconds

This makes a factor of 20! Using order 8 for a butterworth filter is definitely not a good idea. I cannot think of any situation where this would make sense. To prove the other problems that arise when using such a filter, let's look at the effect of those filters. We apply bandpass filtering to our data, once with order 8 and once with order 2:

data_bp8 = butter_bandpass_filter(data,300,2000,20000,8)
data_bp2 = butter_bandpass_filter(data,300,2000,20000,2)

Now let's do some plots. First, simple lines (I didn't care about the x-axis)

# plot signals
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(data, label="raw")
plt.plot(data_bp8, label="order 8")
plt.plot(data_bp2, label="order 2")
plt.legend()

This gives us:

Signals

Oh, what happened to the green line? Weird, isn't it? The reason is that butterworth filters of order 8 become a rather instable thing. Ever heard of resonance disaster / catastrophe? This is what it looks like.

The power spectral densities of these signals can be plotted like:

# plot power spectral densities
plt.figure(2)
plt.psd(data, Fs=200000, label="raw")
plt.psd(data_bp8, Fs=20000, label="order 8")
plt.psd(data_bp2, Fs=20000, label="order 2")
plt.legend()

plt.show()

PSD

Here, you see that the green line has sharper edges, but at what price? Artificial peak at approx. 300 Hz. The signal is completely distorted.

So what shall you do?

  • Never use butterworth filter of order 8
  • Use lower order, if it is sufficient.
  • If not, create some FIR filter with the Parks-McGlellan or Remez-Exchange-Algorithms. There is scipy.signal.remez, for example.

Another hint: if you care about the phase of your signal, you should definitely filter forwards and backwards in time. lfilter does shift the phases otherwise. An implementation of such an algorithm, commonly refered to as filtfilt, can be found at my github repository.

One more programming hint:

If you have the situation of pass-through parameters (four parameters of butter_bandpass_filter are only passed through to butter_bandpass, you could make use of *args and **kwargs.

def butter_bandpass(lowcut,highcut,fs,order=8):
    nyq = 0.5*fs
    low = lowcut/nyq
    high = highcut/nyq

    b,a = butter(order, [low, high], btype='band')
    return b,a

def butter_bandpass_filter(data, *args, **kwargs):
    b,a = butter_bandpass(*args, **kwargs)
    return lfilter(b,a,data) 

This reduces code redundancy and will make your code less error-prone.

Finally, here is the complete script, for easy copy-pasting to try it out.

import numpy as np
from scipy.signal import butter, lfilter

##############################################################
# For generating sample-data
##############################################################
from scipy.integrate import odeint

def roessler_ode(y,t,omega=1,a=0.165,b=0.2,c=10):
    dy = np.zeros((3))
    dy[0] = -1.0*(omega*y[1] + y[2]) #+ e1*(y[3]-y[0])
    dy[1] = omega * y[0] + a * y[1]
    dy[2] = b + y[2] * (y[0] - c)
    return dy

class Roessler(object):
    """A single coupled Roessler oscillators"""
    def __init__(self, y=None, omega=1.0, a=0.165,b=0.2,c=10):
        self.omega = omega
        self.a = a
        self.b = b
        self.c = c
        if y==None:
            self.y = np.random.random((3))+0.5
        else:
            self.y = y

    def ode(self,y,t):
        dy = roessler_ode(y[:],t,self.omega,self.a,self.b,self.c)
        return dy

    def integrate(self,ts):
        rv = odeint(self.ode,self.y,ts)
        self.y = rv[-1,:]
        return rv
###############################################################


def butter_bandpass(lowcut,highcut,fs,order=8):
    nyq = 0.5*fs
    low = lowcut/nyq
    high = highcut/nyq

    b,a = butter(order, [low, high], btype='band')
    return b,a

def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
    b,a = butter_bandpass(lowcut,highcut,fs,order=order)
    return lfilter(b,a,data) 

# generate sample data
data = Roessler().integrate(np.arange(0,1000,0.1))[:,2]
data += np.random.normal(size=data.shape)

# time execution
from timeit import timeit
time_order8 = timeit("butter_bandpass_filter(data,300,2000,20000,8)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
time_order2 = timeit("butter_bandpass_filter(data,300,2000,20000,2)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
print "For order 8: %.2f seconds" % time_order8
print "For order 2: %.2f seconds" % time_order2

data_bp8 = butter_bandpass_filter(data,300,2000,20000,8)
data_bp2 = butter_bandpass_filter(data,300,2000,20000,2)

# plot signals
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(data, label="raw")
plt.plot(data_bp8, label="order 8")
plt.plot(data_bp2, label="order 2")
plt.legend()

# plot power spectral densities
plt.figure(2)
plt.psd(data, Fs=200000, label="raw")
plt.psd(data_bp8, Fs=20000, label="order 8")
plt.psd(data_bp2, Fs=20000, label="order 2")
plt.legend()

plt.show()
Skirr answered 5/2, 2013 at 7:40 Comment(3)
I'm so sorry I forgot to mention. It is a binary format. If you seek to byte 432, the remaining bytes are all int16.Conjugate
What is the difference between your filtfilt and that from SciPy?Conjugate
i guess that my implementation is older, from a time when there was none in scipy...Skirr

© 2022 - 2024 — McMap. All rights reserved.