Filtering audio signal in TensorFlow
Asked Answered
E

1

7

I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets. One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5 and TensorFlow 2.8.

In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter

import numpy as np
from scipy import signal
import soundfile as sf

h = np.load("rir.npy")
x, fs = sf.read("audio.wav")

y = signal.lfilter(h, 1, x)

Running in loop on all the files may take a long time. Doing it with TensorFlow map utility on TensorFlow datasets:

# define filter function
def h_filt(audio, label):
    h = np.load("rir.npy")
    x = audio.numpy()
    y = signal.lfilter(h, 1, x)
    return tf.convert_to_tensor(y, dtype=tf.float32), label

# apply it via TF map on dataset
aug_ds = ds.map(h_filt)

Using tf.numpy_function:

tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])

# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)

I have two questions:

  1. Is this way correct and fast enough (less than a minute for 50,000 files)?
  2. Is there a faster way to do it? E.g. replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of lfilter or SciPy's convolve.
Erb answered 11/4, 2022 at 13:39 Comment(3)
I don't have an answer to your question, but I might have a few useful indications: you can't use TF's eager execution (.numpy()) inside a tf.dataset's map.You will need to wrap the function in a tf.numpy_function. You might also want to look at tf.nn.conv1d for 1d convolutions.Answer
When it comes to the speed requirements, the model of tf.data is a streaming model, so the data will be processed in batches while your model is training. That might be fast enough for your purposes or not.Answer
@Answer I added an example of using tf.numpy_function. Is it correct? Will it work?Erb
P
4

Here is one way you could do

Notice that tensor flow function is designed to receive batches of inputs with multiple channels, and the filter can have multiple input channels and multiple output channels. Let N be the size of the batch I, the number of input channels, F the filter width, L the input width and O the number of output channels. Using padding='SAME' it maps an input of shape (N, L, I) and a filter of shape (F, I, O) to an output of shape (N, L, O).

import numpy as np
from scipy import signal
import tensorflow as tf

# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)

# h
y_lfilt = signal.lfilter(h, 1, x)

# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])

# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
    # x must be padded with half of the size of h
    # to use padding 'SAME'
    np.pad(x, len(h) // 2).reshape(1, -1, 1), 
    # the time axis of h must be flipped
    h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
    stride=1, 
    padding='SAME', 
    data_format='NWC')

assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])
Procto answered 12/4, 2022 at 6:27 Comment(2)
Note that the padding should be done in the beginning of the audio waveform (pad from the start).Erb
Sorry, I didn't follow what said.Procto

© 2022 - 2024 — McMap. All rights reserved.