Implement realtime signal processing in Python - how to capture audio continuously?
Asked Answered
J

3

8

I'm planning to implement a "DSP-like" signal processor in Python. It should capture small fragments of audio via ALSA, process them, then play them back via ALSA.

To get things started, I wrote the following (very simple) code.

import alsaaudio

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(96000)
inp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
inp.setperiodsize(1920)

outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(96000)
outp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
outp.setperiodsize(1920)

while True:
    l, data = inp.read()
    # TODO: Perform some processing.
    outp.write(data)

The problem is, that the audio "stutters" and is not gapless. I tried experimenting with the PCM mode, setting it to either PCM_ASYNC or PCM_NONBLOCK, but the problem remains. I think the problem is that samples "between" two subsequent calls to "inp.read()" are lost.

Is there a way to capture audio "continuously" in Python (preferably without the need for too "specific"/"non-standard" libraries)? I'd like the signal to always get captured "in the background" into some buffer, from which I can read some "momentary state", while audio is further being captured into the buffer even during the time, when I perform my read operations. How can I achieve this?

Even if I use a dedicated process/thread to capture the audio, this process/thread will always at least have to (1) read audio from the source, (2) then put it into some buffer (from which the "signal processing" process/thread then reads). These two operations will therefore still be sequential in time and thus samples will get lost. How do I avoid this?

Thanks a lot for your advice!

EDIT 2: Now I have it running.

import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct

"""
A class implementing buffered audio I/O.
"""
class Audio:

    """
    Initialize the audio buffer.
    """
    def __init__(self):
        #self.__rate = 96000
        self.__rate = 8000
        self.__stride = 4
        self.__pre_post = 4
        self.__read_queue = Queue()
        self.__write_queue = Queue()

    """
    Reads audio from an ALSA audio device into the read queue.
    Supposed to run in its own process.
    """
    def __read(self):
        inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
        inp.setchannels(1)
        inp.setrate(self.__rate)
        inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        inp.setperiodsize(self.__rate / 50)

        while True:
            _, data = inp.read()
            self.__read_queue.put(data)

    """
    Writes audio to an ALSA audio device from the write queue.
    Supposed to run in its own process.
    """
    def __write(self):
        outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
        outp.setchannels(1)
        outp.setrate(self.__rate)
        outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        outp.setperiodsize(self.__rate / 50)

        while True:
            data = self.__write_queue.get()
            outp.write(data)

    """
    Pre-post data into the output buffer to avoid buffer underrun.
    """
    def __pre_post_data(self):
        zeros = np.zeros(self.__rate / 50, dtype = np.uint32)

        for i in range(0, self.__pre_post):
            self.__write_queue.put(zeros)

    """
    Runs the read and write processes.
    """
    def run(self):
        self.__pre_post_data()
        read_process = Process(target = self.__read)
        write_process = Process(target = self.__write)
        read_process.start()
        write_process.start()

    """
    Reads audio samples from the queue captured from the reading thread.
    """
    def read(self):
        return self.__read_queue.get()

    """
    Writes audio samples to the queue to be played by the writing thread.
    """
    def write(self, data):
        self.__write_queue.put(data)

    """
    Pseudonymize the audio samples from a binary string into an array of integers.
    """
    def pseudonymize(self, s):
        return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)

    """
    Depseudonymize the audio samples from an array of integers into a binary string.
    """
    def depseudonymize(self, a):
        s = ""

        for elem in a:
            s += struct.pack(">I", elem)

        return s

    """
    Normalize the audio samples from an array of integers into an array of floats with unity level.
    """
    def normalize(self, data, max_val):
        data = np.array(data)
        bias = int(0.5 * max_val)
        fac = 1.0 / (0.5 * max_val)
        data = fac * (data - bias)
        return data

    """
    Denormalize the data from an array of floats with unity level into an array of integers.
    """
    def denormalize(self, data, max_val):
        bias = int(0.5 * max_val)
        fac = 0.5 * max_val
        data = np.array(data)
        data = (fac * data).astype(np.int64) + bias
        return data

debug = True
audio = Audio()
audio.run()

while True:
    data = audio.read()
    pdata = audio.pseudonymize(data)

    if debug:
        print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))

    ndata = audio.normalize(pdata, 0xffffffff)

    if debug:
        print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
        print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))

    #ndata += 0.01 # When I comment in this line, it wreaks complete havoc!

    if debug:
        print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
        print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))

    pdata = audio.denormalize(ndata, 0xffffffff)

    if debug:
        print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
        print ""

    data = audio.depseudonymize(pdata)
    audio.write(data)

However, when I even perform the slightest modification to the audio data (e. g. comment that line in), I get a lot of noise and extreme distortion at the output. It seems like I don't handle the PCM data correctly. The strange thing is that the output of the "level meter", etc. all appears to make sense. However, the output is completely distorted (but continuous) when I offset it just slightly.

EDIT 3: I just found out that my algorithms (not included here) work when I apply them to wave files. So the problem really appears to actually boil down to the ALSA API.

EDIT 4: I finally found the problems. They were the following.

1st - ALSA quietly "fell back" to PCM_FORMAT_U8_LE upon requesting PCM_FORMAT_U32_LE, thus I interpreted the data incorrectly by assuming that each sample was 4 bytes wide. It works when I request PCM_FORMAT_S32_LE.

2nd - The ALSA output seems to expect period size in bytes, even though they explicitely state that it is expected in frames in the specification. So you have to set the period size four times as high for output if you use 32 bit sample depth.

3rd - Even in Python (where there is a "global interpreter lock"), processes are slow compared to Threads. You can get latency down a lot by changing to threads, since the I/O threads basically don't do anything that's computationally intensive.

Jural answered 5/1, 2016 at 19:30 Comment(7)
Using a thread to read and post to a queue should work. PCM has a buffer controlled by setperiodsize (it appears to default to 32 frames) which gives you time to post the data an return.Sulfate
I think the problem is that "read()" only reads from the audio device while it runs. If it returns, the read operation is finished (otherwise it could not return any meaningful data). Even if I have a second thread running, doing "read()", then appending the returned data to a buffer, it will not "read()" while appending and therefore there will be a gap in the capture.Jural
Wow. Then that interface is seriously broken. Interfaces that have traditional blocking/non-blocking modes need intermediate buffers for the reason you describe. A real-time interface requires preposting buffers before data is generated. But alsoaudio doesn't seem to work that way. I can't imagine how that module would work without buffering. So...., are you sure that's how it works or are you speculating? I think it buffers X frames at a time and if you don't read it by the time the next X come in, then its lost. Just a guess on my part!Sulfate
I'm not sure. The documentation says it "blocks until a full period is available, then returns it". I don't see any way of telling it when to start capturing that "period" though. Might be that the driver captures continuously in the background and I have to call "read()" at least once per period in order not to miss any data and it then blocks until the end of that period? Possible. Unfortunately, the documentation is not too specific here, but at least that would make considerably more sense. I might have to read lower-level (ALSA, not Python) docs or just do some trial and error. ;-)Jural
Its cheap to experiment... reimplement your example with a thread and a queue then see how "gappy" it is. If do you mind a delay between source and output, increase the buffer size.Sulfate
@Sulfate I now wrote an "improved version", though it didn't actually improve in "real-world performance". It still stutters like before. I don't get it. :-( I can't increase the period size a lot. 1920 is ok for ALSA. I didn't check exactly how high it could get (I chose this number since 20 ms seemed like a reasonable fragment size), but 3840 already seems too high and ALSA throws an error.Jural
Let us continue this discussion in chat.Jural
S
2

When you

  1. read one chunk of data,
  2. write one chunk of data,
  3. then wait for the second chunk of data to be read,

then the buffer of the output device will become empty if the second chunk is not shorter than the first chunk.

You should fill up the output device's buffer with silence before starting the actual processing. Then small delays in either the input or output processing will not matter.

Schear answered 6/1, 2016 at 8:35 Comment(1)
Thanks a lot! I found out that I need at least four full periods in my output buffer for audio to be continuous. Since this is probably different from system to system (and probably also changes with system load), I will make this a configurable variable. Bigger buffer --> higher latency, but lower chance of "stuttering", smaller buffer --> lower latency, but higher chance of "stuttering". Later, I might even "auto-tune" the system by dynamically increasing buffer size on underruns and reducing it when the output buffer is "far from underrunning" when the application is posting new samples.Jural
F
2

You can do that all manually, as @CL recommend in his/her answer, but I'd recommend just using GNU Radio instead:

It's a framework that takes care of doing all the "getting small chunks of samples in and out your algorithm"; it scales very well, and you can write your signal processing either in Python or C++.

In fact, it comes with an Audio Source and an Audio Sink that directly talk to ALSA and just give/take continuous samples. I'd recommend reading through GNU Radio's Guided Tutorials; they explain exactly what is necessary to do your signal processing for an audio application.

A really minimal flow graph would look like:

Flow graph

You can substitute the high pass filter for your own signal processing block, or use any combination of the existing blocks.

There's helpful things like file and wav file sinks and sources, filters, resamplers, amplifiers (ok, multipliers), …

Flyboat answered 6/1, 2016 at 21:43 Comment(3)
Well I'd love to do this in the most "standalone" fashion. My application ain't radio reception / demodulation either. It might save me from interfacing with the audio device myself though. Perhaps I should try working on audio files until I can get the interface to ALSA right. That seems to be the problem. I can now read/write correctly, but I cannot process the data. It appears to be in some weird format. Even adding a small constant ("DC offset") to each sample will result in the output becoming just noise, which is weird, as one would expect it to "basically do nothing".Jural
"standalone" != python, I'd argue. Really, GNU Radio would become a dependency of your python program, but that's really it: a library that needs to be installed in order for your program to function, very much like your python needs to have ALSA support itself.Jolo
Well, it can't be so difficult to interface with ALSA "directly", can it? I think the problem is that the samples are in some "obscure data format" that I don't treat correctly. What do you think?Jural
J
0

I finally found the problems. They were the following.

1st - ALSA quietly "fell back" to PCM_FORMAT_U8_LE upon requesting PCM_FORMAT_U32_LE, thus I interpreted the data incorrectly by assuming that each sample was 4 bytes wide. It works when I request PCM_FORMAT_S32_LE.

2nd - The ALSA output seems to expect period size in bytes, even though they explicitely state that it is expected in frames in the specification. So you have to set the period size four times as high for output if you use 32 bit sample depth.

3rd - Even in Python (where there is a "global interpreter lock"), processes are slow compared to Threads. You can get latency down a lot by changing to threads, since the I/O threads basically don't do anything that's computationally intensive.

Audio is gapless and undistorted now, but latency is far too high.

Jural answered 5/2, 2016 at 20:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.