How to properly decode .wav with Python
Asked Answered
L

2

6

I am coding a basic frequency analisys of WAVE audio files, but I have trouble when it comes to convertion from WAVE frames to integer.

Here is the relevant part of my code:

import wave
track = wave.open('/some_path/my_audio.wav', 'r')

byt_depth = track.getsampwidth() #Byte depth of the file in BYTES
frame_rate = track.getframerate()
buf_size = 512

def byt_sum (word):
#convert a string of n bytes into an int in [0;8**n-1]
    return sum( (256**k)*word[k] for k in range(len(word)) )

raw_buf = track.readframes(buf_size)
'''
One frame is a string of n bytes, where n = byt_depth.
For instance, with a 24bits-encoded file, track.readframe(1) could be:
b'\xff\xfe\xfe'.
raw_buf[n] returns an int in [0;255]
'''

sample_buf = [byt_sum(raw_buf[byt_depth*k:byt_depth*(k+1)])
              - 2**(8*byt_depth-1) for k in range(buf_size)]

Problem is: when I plot sample_buf for a single sine signal, I get an alternative, wrecked sine signal. I can't figure out why the signal overlaps udpside-down.

Any idea?

P.S.: Since I'm French, my English is quite hesitating. Feel free to edit if there are ugly mistakes.

Literalism answered 22/12, 2015 at 12:48 Comment(1)
What resource are you using to produce the graph of the wrecked signal?Tenebrae
T
4

It might be because you need to use an unsigned value for representing the 16bit samples. See https://en.wikipedia.org/wiki/Pulse-code_modulation

Try to add 32767 to each sample.

Also you should use the python struct module to decode the buffer.

import struct
buff_size = 512
# 'H' is for unsigned 16 bit integer, try 'h' also
sample_buff = struct.unpack('H'*buf_size, raw_buf)
Tamarind answered 22/12, 2015 at 13:57 Comment(2)
Well, it does work for 16bit-encoded audio, but - in an ideal world - this script is supposed to work for any arbitrary byte depth. (I'm not familiar with the struct module, how should I replace the h for 24bit int ?). Thank you anyway!Literalism
@Literalism : To learn about it (if you already didn't) just import struct and ask help(struct) in the interpreter. All depths are supported except 24-bit, i.e. sampwidth=3. But it's not a problem, you unpack it into 32 bits. Pad the sample with extra zero ('\x00') and remove the zero from the integer after struct.unpack()ing by shifting 8 bits.Ley
S
2

The easiest way is to use a library that does the decoding for you. There are several Python libraries available, my favorite is the soundfile module:

import soundfile as sf
signal, samplerate = sf.read('/some_path/my_audio.wav')
Seeseebeck answered 30/7, 2016 at 10:7 Comment(2)
This module has been flagged as spamLucent
Thanks for the comment @MathieuRodic! The URL has been changed from pysoundfile.readthedocs.io to python-soundfile.readthedocs.io. I have updated the link above.Seeseebeck

© 2022 - 2024 — McMap. All rights reserved.