Asked 13/1, 2010 at 22:3 Answered 2/7, 2023 at 18:14

125

I need to analyze sound written in a .wav file. For that I need to transform this file into set of numbers (arrays, for example). I think I need to use the wave package. However, I do not know how exactly it works. For example I did the following:

import wave
w = wave.open('/usr/share/sounds/ekiga/voicemail.wav', 'r')
for i in range(w.getnframes()):
    frame = w.readframes(i)
    print frame

As a result of this code I expected to see sound pressure as function of time. In contrast I see a lot of strange, mysterious symbols (which are not hexadecimal numbers). Can anybody, pleas, help me with that?

Apology answered 13/1, 2010 at 22:3 Comment(0)

158

Per the documentation, scipy.io.wavfile.read(somefile) returns a tuple of two items: the first is the sampling rate in samples per second, the second is a numpy array with all the data read from the file:

from scipy.io import wavfile
samplerate, data = wavfile.read('./output/audio.wav')

Caird answered 13/1, 2010 at 23:44 Comment(10)

You can combine this with command line conversion tools to open other formats. – Authenticity 31/12, 2010 at 2:31

It seriously lacks the number of channels though. How are you supposed to work with audio without knowing the number of channels? – Hooknosed 2/3, 2011 at 10:58

throwns some weird struct unpacking errors on my computer. I think it's using struct.unpack('<i',data) instead of the struct.unpack('<h',data) nak used below. – Bywaters 2/7, 2013 at 9:16

Does this library work? I run into a number of problems: scipy.io.wavfile.read('/usr/lib/python2.7/dist-packages/pygame/examples/data/house_lo.wav') -> No data. scipy.io.wavfile.read('/usr/lib/python2.7/dist-packages/pygame/examples/data/secosmic_lo.wav') -> ZeroDivisionError: integer division or modulo by zero – Grope 26/9, 2013 at 13:7

But what about 24 bit stereo file? – Donkey 15/6, 2016 at 12:33

github.com/scipy/scipy/issues/1930#issuecomment-28460402 For anyone who needs to read 24 bit files, use wavio by Warren Weckesser: github.com/WarrenWeckesser/wavio – Milton 15/5, 2018 at 17:9

@Hooknosed data is a 2-D numpy array so data.shape returns a tuple of (num_samples, num_channels) – Looselimbed 6/7, 2019 at 5:49

Using scipy.io.wavfile() won't work on 24bit WAV files. It will fail with ""Unsupported bit depth: the wav file has 24-bit data" error. – Cartwright 18/12, 2019 at 10:6

@FinnÅrupNielsen Yes it works. But WAV is a container format which can contain samples in many different formats/audio codecs. wavfile knows PCM samples and IEEE float samples. The house_lo.wav is 8 bit PCM and can be read on my machine. The secosmic_lo.wav contains Microsoft ADPCM samples which can't be read and result in a ValueError: Unknown wave file format here. – Fixer 11/6, 2020 at 16:19

@Looselimbed @Hooknosed correction: wavefile.read(...)[1].data is a np.ndarray so for a mono track it will be a 1-D array. For stereo 2-D array with .shape = (num_samples, 2) – Looselimbed 8/5, 2022 at 22:50

Using the struct module, you can take the wave frames (which are in 2's complementary binary between -32768 and 32767 (i.e. 0x8000 and 0x7FFF). This reads a MONO, 16-BIT, WAVE file. I found this webpage quite useful in formulating this:

import wave, struct

wavefile = wave.open('sine.wav', 'r')

length = wavefile.getnframes()
for i in range(0, length):
    wavedata = wavefile.readframes(1)
    data = struct.unpack("<h", wavedata)
    print(int(data[0]))

This snippet reads 1 frame. To read more than one frame (e.g., 13), use

wavedata = wavefile.readframes(13)
data = struct.unpack("<13h", wavedata)

Ludewig answered 12/3, 2011 at 7:21 Comment(12)

how to handle 24bits stereo files ? – Housebreaking 13/11, 2013 at 20:10

this gives me the error: "struct.error: unpack requires a string argument of length 2" – Droopy 14/10, 2014 at 16:5

If you run this piece of code with a very big audio file. Your computer will be die to due the memory need by this program. Need to process audio file by block for big audio file – Relationship 28/4, 2015 at 12:48

@Droopy You probably have a stereo wave file, or a different bit depth. – Forthright 18/6, 2015 at 1:49

For those who, like me, are wondering what is 2s complementary binary, see here https://mcmap.net/q/16771/-what-is-two-39-s-complement – Hensley 28/9, 2015 at 12:32

@Housebreaking <hh maybe? – Dispensable 27/3, 2017 at 7:32

@Droopy To use struct.unpack, you have to specify the number of values encoded. For example, to read 11 values using struct.unpack, use struct.unpack("<11h", waveData) – Monogenic 4/5, 2018 at 20:3

This is exactly what I was looking for, thanks for doing the digging! I don't know if I would have ever figured this out on my own. – Marinamarinade 15/5, 2018 at 16:29

You can also try struct.iter_unpack. – That 30/8, 2019 at 18:55

@Relationship It won't die due to memory exhaustion. It will be slow but it doesn't use much memory at all. – Fixer 11/6, 2020 at 16:23

Instead of the struct module I would use the array module or Numpy arrays. With struct the "<" is wrong by the way. The wave module converts the endianess to ”native” endieness, so "<" makes the code platform dependent. – Fixer 11/6, 2020 at 16:31

If you would like to convert the wavedata bytes object to a numpy array you can use something like data = np.frombuffer(wavedata, dtype=np.int16) . Data type can be int16/int8 or uint16/uint8 if you have 16/8 bit, signed/unsigned audio. – Substituent 27/7, 2023 at 13:0

Different Python modules to read wav:

There is at least these following libraries to read wave audio files:

SoundFile
scipy.io.wavfile (from scipy)
wave (to read streams. Included in Python 2 and 3)
scikits.audiolab (unmaintained since 2010)
sounddevice (play and record sounds, good for streams and real-time)
pyglet
librosa (music and audio analysis)
madmom (strong focus on music information retrieval (MIR) tasks)

The most simple example:

This is a simple example with SoundFile:

import soundfile as sf
data, samplerate = sf.read('existing_file.wav')

Format of the output:

Warning, the data are not always in the same format, that depends on the library. For instance:

from scikits import audiolab
from scipy.io import wavfile
from sys import argv
for filepath in argv[1:]:
    x, fs, nb_bits = audiolab.wavread(filepath)
    print('Reading with scikits.audiolab.wavread:', x)
    fs, x = wavfile.read(filepath)
    print('Reading with scipy.io.wavfile.read:', x)

Output:

Reading with scikits.audiolab.wavread: [ 0.          0.          0.         ..., -0.00097656 -0.00079346 -0.00097656]
Reading with scipy.io.wavfile.read: [  0   0   0 ..., -32 -26 -32]

SoundFile and Audiolab return floats between -1 and 1 (as matab does, that is the convention for audio signals). Scipy and wave return integers, which you can convert to floats according to the number of bits of encoding, for example:

from scipy.io.wavfile import read as wavread
samplerate, x = wavread(audiofilename)  # x is a numpy array of integers, representing the samples 
# scale to -1.0 -- 1.0
if x.dtype == 'int16':
    nb_bits = 16  # -> 16-bit wav files
elif x.dtype == 'int32':
    nb_bits = 32  # -> 32-bit wav files
max_nb_bit = float(2 ** (nb_bits - 1))
samples = x / (max_nb_bit + 1)  # samples is a numpy array of floats representing the samples

Chinaman answered 3/11, 2014 at 14:13 Comment(0)

IMHO, the easiest way to get audio data from a sound file into a NumPy array is SoundFile:

import soundfile as sf
data, fs = sf.read('/usr/share/sounds/ekiga/voicemail.wav')

This also supports 24-bit files out of the box.

There are many sound file libraries available, I've written an overview where you can see a few pros and cons. It also features a page explaining how to read a 24-bit wav file with the wave module.

Brook answered 17/9, 2015 at 12:9 Comment(3)

Note: soundfile.read() normalizes by 2^(n_bits - 1) as in sandoval's scipy.io.wavfile example – Gonsalve 25/3, 2017 at 18:49

But when executed, the read returns an error: Error opening '../audio.wav': File contains data in an unimplemented format. The file I’m trying to open begins with: OggS Any idea what’s wrong here? – Catullus 11/2, 2021 at 5:7

@Matthias: I can see that you are the maintainer of soundfile, and also that you are posting on many stackoverflow audio-related posts and promoting soundfile as the solution everywhere. Whether or not your solution works, this isn't your personal advertising platform or Github. (You can be banned for this.) – Pita 8/3, 2022 at 3:8

You can accomplish this using the scikits.audiolab module. It requires NumPy and SciPy to function, and also libsndfile.

Note, I was only able to get it to work on Ubunutu and not on OSX.

from scikits.audiolab import wavread

filename = "testfile.wav"

data, sample_frequency,encoding = wavread(filename)

Now you have the wav data

Annotate answered 17/6, 2011 at 22:10 Comment(1)

scikits.audiolab has not been updated since 2010 and it's probably Python 2 only. – Matrilineage 2/9, 2020 at 16:13

If you want to procces an audio block by block, some of the given solutions are quite awful in the sense that they imply loading the whole audio into memory producing many cache misses and slowing down your program. python-wavefile provides some pythonic constructs to do NumPy block-by-block processing using efficient and transparent block management by means of generators. Other pythonic niceties are context manager for files, metadata as properties... and if you want the whole file interface, because you are developing a quick prototype and you don't care about efficency, the whole file interface is still there.

A simple example of processing would be:

import sys
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r:
    with WaveWriter(
        'output.wav',
        channels=r.channels,
        samplerate=r.samplerate,
    ) as w:

        # Just to set the metadata
        w.metadata.title = r.metadata.title + " II"
        w.metadata.artist = r.metadata.artist

        # This is the prodessing loop
        for data in r.read_iter(size=512):
            data[1] *= .8     # lower volume on the second channel
            w.write(data)

The example reuses the same block to read the whole file, even in the case of the last block that usually is less than the required size. In this case you get an slice of the block. So trust the returned block length instead of using a hardcoded 512 size for any further processing.

Malek answered 16/9, 2014 at 9:54 Comment(1)

This is an excellent answer. Now almost 8 years later, watch out for the Python 3.7: StopIteration (PEP 479) with the r.read_iter, rather you may prefer to use the plain-C like style (no iters) to get beyond it, if you install from PyPI – Batangas 14/2, 2023 at 7:13

My dear, as far as I understood what you are looking for, you are getting into a theory field called Digital Signal Processing (DSP). This engineering area comes from a simple analysis of discrete-time signals to complex adaptive filters. A nice idea is to think of the discrete-time signals as a vector, where each element of this vector is a sampled value of the original, continuous-time signal. Once you get the samples in a vector form, you can apply different digital signal techniques to this vector.

Unfortunately, on Python, moving from audio files to NumPy array vector is rather cumbersome, as you could notice... If you don't idolize one programming language over other, I highly suggest trying out MatLab/Octave. Matlab makes the samples access from files straightforward. audioread() makes this task to you :) And there are a lot of toolboxes designed specifically for DSP.

Nevertheless, if you really intend to get into Python for this, I'll give you a step-by-step to guide you.

1. Get the samples

The easiest way the get the samples from the .wav file is:

from scipy.io import wavfile

sampling_rate, samples = wavfile.read(f'/path/to/file.wav')

Alternatively, you could use the wave and struct package to get the samples:

import numpy as np
import wave, struct

wav_file = wave.open(f'/path/to/file.wav', 'rb')
# from .wav file to binary data in hexadecimal
binary_data = wav_file.readframes(wav_file.getnframes())
# from binary file to samples
s = np.array(struct.unpack('{n}h'.format(n=wav_file.getnframes()*wav_file.getnchannels()), binary_data))

Answering your question: binary_data is a bytes object, which is not human-readable and can only make sense to a machine. You can validate this statement typing type(binary_data). If you really want to understand a little bit more about this bunch of odd characters, click here.

If your audio is stereo (that is, has 2 channels), you can reshape this signal to achieve the same format obtained with scipy.io

s_like_scipy = s.reshape(-1, wav_file.getnchannels())

Each column is a chanell. In either way, the samples obtained from the .wav file can be used to plot and understand the temporal behavior of the signal.

In both alternatives, the samples obtained from the files are represented in the Linear Pulse Code Modulation (LPCM)

2. Do digital signal processing stuffs onto the audio samples

I'll leave that part up to you :) But this is a nice book to take you through DSP. Unfortunately, I don't know good books with Python, they are usually horrible books... But do not worry about it, the theory can be applied in the very same way using any programming language, as long as you domain that language.

Whatever the book you pick up, stick with the classical authors, such as Proakis, Oppenheim, and so on... Do not care about the language programming they use. For a more practical guide of DPS for audio using Python, see this page.

3. Play the filtered audio samples

import pyaudio

p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(wav_file.getsampwidth()),
                channels = wav_file.getnchannels(),
                rate = wav_file.getframerate(),
                output = True)
# from samples to the new binary file
new_binary_data = struct.pack('{}h'.format(len(s)), *s)
stream.write(new_binary_data)

where wav_file.getsampwidth() is the number of bytes per sample, and wav_file.getframerate() is the sampling rate. Just use the same parameters of the input audio.

4. Save the result in a new `.wav` file

wav_file=wave.open('/phat/to/new_file.wav', 'w')

wav_file.setparams((nchannels, sampwidth, sampling_rate, nframes, "NONE", "not compressed"))

for sample in s:
   wav_file.writeframes(struct.pack('h', int(sample)))

where nchannels is the number of channels, sampwidth is the number of bytes per samples, sampling_rate is the sampling rate, nframes is the total number of samples.

Foreign answered 9/2, 2022 at 19:46 Comment(0)

If you're going to perform transfers on the waveform data then perhaps you should use SciPy, specifically scipy.io.wavfile.

Induce answered 13/1, 2010 at 22:11 Comment(2)

OK. I just installed the SciPy but I cannot find any example of the usage of scipy.io.wavfile. – Apology 13/1, 2010 at 22:25

Nothing like the interactive interpreter for figuring out how things work! Be ambitious! – Induce 13/1, 2010 at 22:44

Here's a Python 3 solution using the built in wave module [1], that works for n channels, and 8,16,24... bits.

import sys
import wave

def read_wav(path):
    with wave.open(path, "rb") as wav:
        nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
        print(wav.getparams(), "\nBits per sample =", sampwidth * 8)

        signed = sampwidth > 1  # 8 bit wavs are unsigned
        byteorder = sys.byteorder  # wave module uses sys.byteorder for bytes

        values = []  # e.g. for stereo, values[i] = [left_val, right_val]
        for _ in range(nframes):
            frame = wav.readframes(1)  # read next frame
            channel_vals = []  # mono has 1 channel, stereo 2, etc.
            for channel in range(nchannels):
                as_bytes = frame[channel * sampwidth: (channel + 1) * sampwidth]
                as_int = int.from_bytes(as_bytes, byteorder, signed=signed)
                channel_vals.append(as_int)
            values.append(channel_vals)

    return values, framerate

You can turn the result into a NumPy array.

import numpy as np

data, rate = read_wav(path)
data = np.array(data)

Note, I've tried to make it readable rather than fast. I found reading all the data at once was almost 2x faster. E.g.

with wave.open(path, "rb") as wav:
    nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
    all_bytes = wav.readframes(-1)

framewidth = sampwidth * nchannels
frames = (all_bytes[i * framewidth: (i + 1) * framewidth]
            for i in range(nframes))

for frame in frames:
    ...

Although python-soundfile is roughly 2 orders of magnitude faster (hard to approach this speed with pure CPython).

[1] https://docs.python.org/3/library/wave.html

Muddleheaded answered 20/12, 2020 at 9:32 Comment(1)

Thanks, this works crazy well on .wav files with any bit-width, and I think is actually better than one of the top-voted solutions – Krysta 9/7, 2022 at 16:33

I needed to read a 1-channel 24-bit WAV file. The post above by Nak was very useful. However, as mentioned above by basj 24-bit is not straightforward. I finally got it working using the following snippet:

from scipy.io import wavfile
TheFile = 'example24bit1channelFile.wav'
[fs, x] = wavfile.read(TheFile)

# convert the loaded data into a 24bit signal

nx = len(x)
ny = nx/3*4    # four 3-byte samples are contained in three int32 words

y = np.zeros((ny,), dtype=np.int32)    # initialise array

# build the data left aligned in order to keep the sign bit operational.
# result will be factor 256 too high

y[0:ny:4] = ((x[0:nx:3] & 0x000000FF) << 8) | \
  ((x[0:nx:3] & 0x0000FF00) << 8) | ((x[0:nx:3] & 0x00FF0000) << 8)
y[1:ny:4] = ((x[0:nx:3] & 0xFF000000) >> 16) | \
  ((x[1:nx:3] & 0x000000FF) << 16) | ((x[1:nx:3] & 0x0000FF00) << 16)
y[2:ny:4] = ((x[1:nx:3] & 0x00FF0000) >> 8) | \
  ((x[1:nx:3] & 0xFF000000) >> 8) | ((x[2:nx:3] & 0x000000FF) << 24)
y[3:ny:4] = (x[2:nx:3] & 0x0000FF00) | \
  (x[2:nx:3] & 0x00FF0000) | (x[2:nx:3] & 0xFF000000)

y = y/256   # correct for building 24 bit data left aligned in 32bit words

Some additional scaling is required if you need results between -1 and +1. Maybe some of you out there might find this useful

Ironic answered 14/6, 2015 at 17:1 Comment(0)

PyDub (http://pydub.com/) has not been mentioned and that should be fixed. IMO this is the most comprehensive library for reading audio files in Python right now, although not without its faults. Reading a wav file:

from pydub import AudioSegment

audio_file = AudioSegment.from_wav('path_to.wav')
# or
audio_file = AudioSegment.from_file('path_to.wav')

# do whatever you want with the audio, change bitrate, export, convert, read info, etc.
# Check out the API docs http://pydub.com/

PS. The example is about reading a wav file, but PyDub can handle a lot of various formats out of the box. The caveat is that it's based on both native Python wav support and ffmpeg, so you have to have ffmpeg installed and a lot of the pydub capabilities rely on the ffmpeg version. Usually if ffmpeg can do it, so can pydub (which is quite powerful).

Non-disclaimer: I'm not related to the project, but I am a heavy user.

Ninurta answered 22/4, 2020 at 11:7 Comment(0)

if its just two files and the sample rate is significantly high, you could just interleave them.

from scipy.io import wavfile
rate1,dat1 = wavfile.read(File1)
rate2,dat2 = wavfile.read(File2)

if len(dat2) > len(dat1):#swap shortest
    temp = dat2
    dat2 = dat1
    dat1 = temp

output = dat1
for i in range(len(dat2)/2): output[i*2]=dat2[i*2]

wavfile.write(OUTPUT,rate,dat)

Badajoz answered 23/8, 2013 at 16:51 Comment(0)

As the other answers lay out there are many ways to read a wav file in python. Using the built in wave module has the advantage that no external dependencies are needed. First the solution – this reads a mono or stereo wavfile and prints the first 100 samples of the first channel:

import wave
import sys

w = wave.open('/path/to/your-file.wav', 'rb')
channels = w.getnchannels()
samplewidth = w.getsampwidth()
print(f"Audio has {channels} channels and each sample is {samplewidth} bytes ({samplewidth * 8} bits) wide")
samples = []

# Iterate over the frames
for n in range(w.getnframes()):
    # Read a frames bytes
    frame = w.readframes(n)
    # Skip empty frames
    if frame != b'':
        # Convert the frame into a list of integers, assuming the systems
        # endianess and signed integers
        frame_data = [int.from_bytes(frame[i:i+samplewidth], byteorder=sys.byteorder, signed=True) for i in range(0, len(frame), samplewidth)]
        # If we have more than one channel the samples of each channel
        # should be interleaved
        if channels == 1:
            # Mono is simple: each frame can contain multiple samples
            for sample in frame_data:
                samples.append(sample)
        elif channels == 2:
            # Stereo samples are interleaved: (L/R/L/R/...)
            # Iterate in steps of 2 over the frames and deinterleave
            # them into the samples for left and right
            for c in range(0, len(frame_data), 2):
                left, right = zip(frame_data[c:c+2])
                left, right = left[0], right[0]
                samples.append(right)
        else:
            # Print lame excuse and exit
            print(f"Error: Sorry, we do not support wave files with {channels} channels", file=sys.stderr)
            exit(1)

# Print first 100 samples
print(samples[100:])

Details

Samples

Ultimately everything in a binary file is bytes (those weird characters you got). A byte consists of 8 bits that can be either 0 or 1. Now with a bit of knowledge of audio files you might know that wav files come in different bitdepths. A sample in consumer audio (say a CD, the Audio from a youtube video etc) is typically 16 bits which gives us a vertical resolution of 2^16 or 65536 steps. But there is also 24 bits for sound studio applications or more and more 32 bit (float) files. That means in order to interpret the bytes of our sample in the right way, we need to know how many bytes are used for one sample and how they are ordered. Gladly the .getsampwidth()-method will tell us this:

I for example read a 24-bit wav file and the samplewidth I got was 3 bytes – 3×8 bit results indeed in 24. So I need to get 3 bytes from the frame and convert them to a integer number:

sample = [int.from_bytes(frame[i:i+samplewidth], byteorder=sys.byteorder, signed=True) for i in range(0, len(frame), samplewidth)]

byteorder=sys.byteorder describes the endianess of the bytes – so whether we have to read them from left to right ("big") or from right to left ("little") in order to construct our number. In this case we just take whatever the endianess of our system is. Note that for 8 bit audio this can be ignored, as there is only one byte and there is no direction in which it can be read.

signed=True says that we expect signed integers, as opposed to unsigned ones which are only positive. Signed should work for most common 16 and 24 bit audio files.

If you want to convert the audio to e.g. a float between -1.0 and +1.0 you need to work out the number of possible values in one half (e.g. 2**24 // 2) and divide your sample by that.

Channels

A wave file can carry more than one audio channel. It could be mono, stereo, surround or other multichannel-configurations. Mono would be the simple case, but in multichannel wavs the samples are typically interleaved. That means one frame will carry samples from all channels in alternating fashion. Assuming Stereo, that might be:

L/R/L/R/L/R

I use pythons zip function to unpack the samples into seperate left and right variables.

Caveat

I think the major challenge in reading wave files is handling all the possible ways a wave file can look. Wave files can get even more complicated than that (e.g. meta-data headers, chapter marks, ...) so for full compatibility it might be wise to rely on something else. But if you know the wave files you want to read something like this might work fine.

Doublure answered 2/7, 2023 at 18:14 Comment(0)

-1

u can also use simple import wavio library u also need have some basic knowledge of the sound.

Stygian answered 27/1, 2018 at 5:50 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++