Importing sound files into Python as NumPy arrays (alternatives to audiolab)
Asked Answered
E

5

14

I've been using Audiolab to import sound files in the past, and it worked quite well. However:

-

In [2]: from scikits import audiolab
--------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

C:\Python26\Scripts\<ipython console> in <module>()

C:\Python26\lib\site-packages\scikits\audiolab\__init__.py in <module>()
     23 __version__ = _version
     24
---> 25 from pysndfile import formatinfo, sndfile
     26 from pysndfile import supported_format, supported_endianness, \
     27                       supported_encoding, PyaudioException, \

C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py in <module>()
----> 1 from _sndfile import Sndfile, Format, available_file_formats, available_encodings
      2 from compat import formatinfo, sndfile, PyaudioException, PyaudioIOError
      3 from compat import supported_format, supported_endianness, supported_encoding

ImportError: DLL load failed: The specified module could not be found.``

So I would like to either:

  • Figure out why it's not working in 2.6 (something wrong with _sndfile.pyd?) and maybe find a way to extend it to work with unsupported formats
  • Find a complete replacement for audiolab
Encumbrancer answered 1/3, 2010 at 15:20 Comment(3)
The problem is specific to python 2.6 on windows (i.e. you won't see it on python 2.5). I have not found a way to fix it yetSapp
And I finally took the time between two flights, it ended up being a mingw bug. I have posted a new 0.11.0 version, which should fix this issue.Sapp
David, you have made a wonderful tool in audiolab! I use it often. Thank you.Electromagnetism
E
2

I've been using PySoundFile lately instead of Audiolab. It installs easily with conda.

It does not support mp3, like most things. MP3 is no longer patented, so there's no reason why it can't support it; someone just has to write support into libsndfile.

Encumbrancer answered 26/2, 2018 at 14:53 Comment(0)
E
15

Audiolab is working for me on Ubuntu 9.04 with Python 2.6.2, so it might be a Windows problem. In your link to the forum, the author also suggests that it is a Windows error.

In the past, this option has worked for me, too:

from scipy.io import wavfile
fs, data = wavfile.read(filename)

Just beware that data may have int data type, so it is not scaled within [-1,1). For example, if data is int16, you must divide data by 2**15 to scale within [-1,1).

Electromagnetism answered 1/3, 2010 at 22:46 Comment(4)
I'm not certain about that. 16- or 32-bit should be fine, but I don't know about 24-bit.Electromagnetism
It doesn't read much of anything. Even 16-bit files come out inverted, with wraparound errors for a value of -1. 24-bit gets "TypeError: data type not understood" Surely there's something better...Encumbrancer
Can you post a file which gives you this error ? Also, does the test suite passes correctly (scikits.audiolab.test()) ? audiolab uses libsndfile, which is by far the best and most reliable audio IO library that I know. There may be an error in audiolab itself, of courseSapp
I don't see the wraparound bug now, but that was a bug with scipy.io, not audiolab.Encumbrancer
R
6

Sox http://sox.sourceforge.net/ can be your friend for this. It can read many many different formats and output them as raw in whatever datatype you prefer. In fact, I just wrote the code to read a block of data from an audio file into a numpy array.

I decided to go this route for portability (sox is very widely available) and to maximize the flexibility of input audio types I could use. Actually, it seems from initial testing that it isn't noticeably slower for what I'm using it for... which is reading short (a few seconds) of audio from very long (hours) files.

Variables you need:

SOX_EXEC # the sox / sox.exe executable filename
filename # the audio filename of course
num_channels # duh... the number of channels
out_byps # Bytes per sample you want, must be 1, 2, 4, or 8

start_samp # sample number to start reading at
len_samp   # number of samples to read

The actual code is really simple. If you want to extract the whole file, you can remove the start_samp, len_samp, and 'trim' stuff.

import subprocess # need the subprocess module
import numpy as NP # I'm lazy and call numpy NP

cmd = [SOX_EXEC,
       filename,              # input filename
       '-t','raw',            # output file type raw
       '-e','signed-integer', # output encode as signed ints
       '-L',                  # output little endin
       '-b',str(out_byps*8),  # output bytes per sample
       '-',                   # output to stdout
       'trim',str(start_samp)+'s',str(len_samp)+'s'] # only extract requested part 

data = NP.fromstring(subprocess.check_output(cmd),'<i%d'%(out_byps))
data = data.reshape(len(data)/num_channels, num_channels) # make samples x channels

PS: Here is code to read stuff from audio file headers using sox...

    info = subprocess.check_output([SOX_EXEC,'--i',filename])
    reading_comments_flag = False
    for l in info.splitlines():
        if( not l.strip() ):
            continue
        if( reading_comments_flag and l.strip() ):
            if( comments ):
                comments += '\n'
            comments += l
        else:
            if( l.startswith('Input File') ):
                input_file = l.split(':',1)[1].strip()[1:-1]
            elif( l.startswith('Channels') ):
                num_channels = int(l.split(':',1)[1].strip())
            elif( l.startswith('Sample Rate') ):
                sample_rate = int(l.split(':',1)[1].strip())
            elif( l.startswith('Precision') ):
                bits_per_sample = int(l.split(':',1)[1].strip()[0:-4])
            elif( l.startswith('Duration') ):
                tmp = l.split(':',1)[1].strip()
                tmp = tmp.split('=',1)
                duration_time = tmp[0]
                duration_samples = int(tmp[1].split(None,1)[0])
            elif( l.startswith('Sample Encoding') ):
                encoding = l.split(':',1)[1].strip()
            elif( l.startswith('Comments') ):
                comments = ''
                reading_comments_flag = True
            else:
                if( other ):
                    other += '\n'+l
                else:
                    other = l
                if( output_unhandled ):
                    print >>sys.stderr, "Unhandled:",l
                pass
Reichenberg answered 21/3, 2012 at 4:40 Comment(2)
Interesting, though kind of kludgy and maybe not cross-platform? There's pysox for interfacing directly with the libSoX library. Looks like SoX supports a bunch of formats on its own and can use several other libraries for more. I have had many problems getting audiolab to work, and it doesn't support MP3s, etc., so pysox might be worth a try.Encumbrancer
I will look at pysox... thanks. Though the subprocess approach using sox isn't really pythonic or pretty, it is very powerful and relatively portable (since sox binaries/installers can be found for most systems).Reichenberg
C
5

FFmpeg supports mp3s and works on Windows (http://zulko.github.io/blog/2013/10/04/read-and-write-audio-files-in-python-using-ffmpeg/).

Reading an mp3 file:

import subprocess as sp

FFMPEG_BIN = "ffmpeg.exe"

command = [ FFMPEG_BIN,
        '-i', 'mySong.mp3',
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', '44100', # ouput will have 44100 Hz
        '-ac', '2', # stereo (set to '1' for mono)
        '-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

Format data into numpy array:

raw_audio = pipe.proc.stdout.read(88200*4)

import numpy

audio_array = numpy.fromstring(raw_audio, dtype="int16")
audio_array = audio_array.reshape((len(audio_array)/2,2))
Comedic answered 1/6, 2016 at 15:38 Comment(0)
M
4

In case you want to do this for MP3

Here's what I'm using: It uses pydub and scipy.

Full setup (on Mac, may differ on other systems):

import tempfile
import os
import pydub
import scipy
import scipy.io.wavfile


def read_mp3(file_path, as_float = False):
    """
    Read an MP3 File into numpy data.
    :param file_path: String path to a file
    :param as_float: Cast data to float and normalize to [-1, 1]
    :return: Tuple(rate, data), where
        rate is an integer indicating samples/s
        data is an ndarray(n_samples, 2)[int16] if as_float = False
            otherwise ndarray(n_samples, 2)[float] in range [-1, 1]
    """

    path, ext = os.path.splitext(file_path)
    assert ext=='.mp3'
    mp3 = pydub.AudioSegment.from_mp3(file_path)
    _, path = tempfile.mkstemp()
    mp3.export(path, format="wav")
    rate, data = scipy.io.wavfile.read(path)
    os.remove(path)
    if as_float:
        data = data/(2**15)
    return rate, data

Credit to James Thompson's blog

Mcburney answered 26/2, 2018 at 6:37 Comment(3)
You need os.close(_) (and probably rename _ to fd) to close the temp file descriptor. Otherwise, when run in a for loop you will eventually get [Errno 24] Too many open files.Scummy
Instead of exporting to a wav file and reloading it with scipy, you can directly convert to a numpy array: data = np.reshape(mp3.get_array_of_samples(), (-1, 2)).Comedo
what's going on with file_path and FILEPATH ?Meda
E
2

I've been using PySoundFile lately instead of Audiolab. It installs easily with conda.

It does not support mp3, like most things. MP3 is no longer patented, so there's no reason why it can't support it; someone just has to write support into libsndfile.

Encumbrancer answered 26/2, 2018 at 14:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.