Joining .wav files without writing on disk in Python

Asked 24/5, 2018 at 19:58 Answered 24/5, 2018 at 20:39

I have a list of .wav files in binary format (they are coming from a websocket), which I want to join in a single binary .wav file to then do speech recognition with it. I have been able to make it work with the following code:

audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]

# Join wav files
with wave.open('/tmp/input.wav', 'wb') as temp_input:
    params_set = False
    for audio_file in audio:
        with wave.open(audio_file, 'rb') as w:
            if not params_set:
                temp_input.setparams(w.getparams())
                params_set = True
            temp_input.writeframes(w.readframes(w.getnframes()))

# Do speech recognition
binary_audio = open('/tmp/input.wav', 'rb').read())
ASR(binary_audio)

The problem is that I don't want to write the file '/tmp/input.wav' in disk. Is there any way to do it without writing any file in the disk?

Thanks.

Streak answered 24/5, 2018 at 19:58 Comment(2)

Sound can be represented as 1D array when mono, 2d as stereo. Use something like wavefile to get the raw data. – Ornis 24/5, 2018 at 20:29

wave.open accepts either a file path or a file like object. you've already imported BytesIO so just use one of those as a file like buffer. Here's and example of someone doing basically just that with gzip (note the slightly different argument names). – Portentous 24/5, 2018 at 20:32

The general solution for having a file but never putting it to disk is a stream. For this we use the io library which is the default library for working with in-memory streams. You even already use BytesIO earlier in your code it seems.

audio = [binary_wav1, binary_wav2,..., binary_wavN] # a list of .wav binary files coming from a socket
audio = [io.BytesIO(x) for x in audio]

# Join wav files

params_set = False
temp_file = io.BytesIO()
with wave.open(temp_file, 'wb') as temp_input:
    for audio_file in audio:
        with wave.open(audio_file, 'rb') as w:
            if not params_set:
                temp_input.setparams(w.getparams())
                params_set = True
            temp_input.writeframes(w.readframes(w.getnframes()))

#move the cursor back to the beginning of the "file"
temp_file.seek(0)
# Do speech recognition
binary_audio = temp_file.read()
ASR(binary_audio)

note I don't have any .wav files to try this out on. It's up to the wave library to handle the difference between real files and buffered streams properly.

Portentous answered 24/5, 2018 at 20:39 Comment(1)

Thanks, This works! I had tried it before but i was missing the temp_file.seek(0) statement, so I was just reading an empty binary object then. – Petrology 25/5, 2018 at 8:33

With scipy and numpy you can read the wav files as numpy arrays and than do the modifications you want.

from scipy.io import wavfile
import numpy as np

# load files
_, arr1 = wavfile.read('song.wav')
_, arr2 = wavfile.read('Aaron_Copland-Quiet_City.wav')

print(arr1.shape)
print(arr2.shape)

>>> (1323001,)
>>> (1323000,)

# make new array by concatenating two audio waves
new_arr = np.hstack((arr1, arr2))
print(new_arr.shape)

>>> (2646001,)

# save new audio wave
wavfile.write('new_audio.wav')

Libratory answered 24/5, 2018 at 20:33 Comment(1)

This works, but adding a dependency to scipy and/or numpy seems overkill. As @Portentous pointed out in his answer, you can simply write to file-like objects like BytesIO. – Dative 24/5, 2018 at 21:17

Recommended topics

Hot tags