I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets. One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5
and TensorFlow 2.8
.
In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter
import numpy as np
from scipy import signal
import soundfile as sf
h = np.load("rir.npy")
x, fs = sf.read("audio.wav")
y = signal.lfilter(h, 1, x)
Running in loop on all the files may take a long time. Doing it with TensorFlow map
utility on TensorFlow datasets:
# define filter function
def h_filt(audio, label):
h = np.load("rir.npy")
x = audio.numpy()
y = signal.lfilter(h, 1, x)
return tf.convert_to_tensor(y, dtype=tf.float32), label
# apply it via TF map on dataset
aug_ds = ds.map(h_filt)
Using tf.numpy_function
:
tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])
# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)
I have two questions:
- Is this way correct and fast enough (less than a minute for 50,000 files)?
- Is there a faster way to do it? E.g. replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of
lfilter
or SciPy's convolve.
.numpy()
) inside atf.dataset
'smap
.You will need to wrap the function in atf.numpy_function
. You might also want to look attf.nn.conv1d
for 1d convolutions. – Answertf.data
is a streaming model, so the data will be processed in batches while your model is training. That might be fast enough for your purposes or not. – Answertf.numpy_function
. Is it correct? Will it work? – Erb