PyAudio - first few chunks of recording are zero
Asked Answered
K

1

6

I've been having some issues when trying to synchronously playback and record audio to/from a device, in this case, my laptop speakers and microphone.

The problem

I've tried to implement this using the Python modules: "sounddevice" and "pyaudio"; but both implementations have this weird issue where the first few frames of recorded audio are always zero. Has anyone else experienced this type of issue? This issue seems to be independent of the chunksize that is used (i.e., its always the same amount of samples being zero).

Is there anything I can do to prevent this from happening?

Code

import queue

import matplotlib.pyplot as plt
import numpy as np
import pyaudio
import soundfile as sf

FRAME_SIZE = 512
excitation, fs = sf.read("excitation.wav", dtype=np.float32)

# Instantiate PyAudio
p = pyaudio.PyAudio()
q = queue.Queue()

output_idx = 0
mic_buffer = np.zeros((excitation.shape[0] + FRAME_SIZE
                       - (excitation.shape[0] % FRAME_SIZE), 1))


def rec_play_callback(in_data, framelength, time_info, status):
    global output_idx

    # print status of playback in case of event
    if status:
        print(f"status: {status}")

    chunksize = min(excitation.shape[0] - output_idx, framelength)

    # write data to output buffer
    out_data = excitation[output_idx:output_idx + chunksize]
    # write input data to input buffer
    inputsamples = np.frombuffer(in_data, dtype=np.float32)

    if not np.sum(inputsamples):
        print("Empty frame detected")

    # send input data to buffer for main thread
    q.put(inputsamples)

    if chunksize < framelength:
        out_data[chunksize:] = 0
        return (out_data.tobytes(), pyaudio.paComplete)

    output_idx += chunksize
    return (out_data.tobytes(), pyaudio.paContinue)


# Define playback and record stream
stream = p.open(rate=fs,
                channels=1,
                frames_per_buffer=FRAME_SIZE,
                format=pyaudio.paFloat32,
                input=True,
                output=True,
                input_device_index=1,  # Macbook Pro microphone
                output_device_index=2,  # Macbook Pro speakers
                stream_callback=rec_play_callback)

stream.start_stream()

input_idx = 0
while stream.is_active():
    data = q.get(timeout=1)
    mic_buffer[input_idx:input_idx + FRAME_SIZE, 0] = data
    input_idx += FRAME_SIZE

stream.stop_stream()
stream.close()
p.terminate()

# Plot captured microphone signal
plt.plot(mic_buffer)
plt.show()

Output

Empty frame detected

Output of the code above

Edit: running this on MacOS using CoreAudio. This might be relevant, as pointed out by @2e0byo.

Kalvin answered 10/3, 2022 at 8:51 Comment(14)
What os is your laptop running/what sound stack?Gilboa
@2e0byo, Running MacOS 11.2.3 with the default sound stack. So I guess that would be Core Audio?Kalvin
Indeed. Sadly I don't have a mac to test on. I would add this to the q: it's probably relevantGilboa
How are you running the code - in VSCode or in terminal? Look hereVince
@Tony, I am running the code in VSCode. However, permissions are setup correctly and I am receiving audio from the microphones. I get the same result when I run this in a terminal.Kalvin
@Kalvin if the time is the same can you try with adding time.sleep after starting the read? sourceVince
@Tony, thanks for the suggestion. I just tried this and unfortunately, this doesn't seem to solve the issue.Kalvin
@Kalvin last idea - try witth channels=2Vince
@Tony, Unfortunately, the Macbook that I have only supports a single microphone channel. What's your motivation for setting the channel width to two? How could it solve this issue?Kalvin
Let us continue this discussion in chat.Kalvin
Did you try experimenting with GIL-lock blocking the pythonic threads to wait one after all other release the GIL-lock? A default ~100 [ms] GIL-lock quota-of-time ( approximate 100 Python interpreter instructions ) can participate in waiting for Queue-head arrival or other inter-thread ( thus GIL-lock forced interleaving in time ) data-flow. Cannot test to try to oversample the GIL-lock lending, so as to avoid the initial wait-for-first data delivery, yet might help mentioning it. Similarly SIGNAL-handlers can do some similar GIL-thrashing tricksGastrovascular
@user3666197, right. I'm having difficulties understanding your proposed solution because I'm not really familiar with the Python GIL. Assuming the problem would be the GIL blocking the recording thread, wouldn't this be mediated by using the blocking queue, as I do in the code? Again, sorry if I misunderstood your explanation.Kalvin
I tried to put light on GIL-lock, as it is the Python Interpreter internal property (used by the interpreter to principally avoid any & all forms of concurrency within a scope of all interpreter-launched "own"-threads)Tthat means no matter how many threads you have, no matter what soft-locks you operate inside a Python Interpreter eco-system, you always face a pure-[SERIAL] execution (all threads wait, with 1 & only the one, who had acquired the GIL-lock, can make a block of about those 100 [ms] before being forcefully deprived of GIL-lock ownership - so inital zeros mean no data came so farGastrovascular
( sorry, the comment is so weird place to put some meaningful piece of text - in support of explanation of GIL-lock actually preventing any data pre-fetch & delivery in due time, as it will first happen only after all parts have gotten their turn, each one enjoying not more than those ~100 [ms] of consecutive code-interpretation time, next having only to wait for any further chance to acquire the GIL-lock in turn and do some more work ... naturally an uncontrollable scheduling strategy without any warrant - neither for order, nor for maintaining a "fair-share" or "priority"-driven scheduling )Gastrovascular
A
0

This is a general question, and we are missing a complete view of your architecture. So the best we can do is point to some general concepts.

In digital signal processing systems, there is very often a leading blank and a constant delay in the processed signal. This is most often related to the size of a buffer and the sampling rate. In some systems you may not even be aware that the buffer is there, for example as part of a device driver that is not accessible to the user level API.

To reduce an offset due to buffering, you have to make the buffer smaller or sample faster. Your system then has to process smaller packets but more often, and changes in either packet size or sampling clock, can effect your signal processing, depending on your signal content and the kind of signal processing you are doing. So making either of these changes comes with an increase in the overhead per data processed through the system and may also effect performance in other ways.

An approach that I use for debugging problems of this sort is to try to find the buffer that is setting the offset, tracing through source code if needed, and then see whether you can adjust size or sampling rate and still achieve the performance that you need in terms of throughput and accuracy.

Athalee answered 22/3, 2022 at 4:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.