Trying to convert an mp3 file to a Numpy Array, and ffmpeg just hangs

Asked 4/7, 2016 at 21:54 Answered 26/2, 2018 at 6:39

Solved python numpy ffmpeg stream scikit-learn

I'm working on a music classification methodology with Scikit-learn, and the first step in that process is converting a music file to a numpy array.

After unsuccessfully trying to call ffmpeg from a python script, I decided to simply pipe the file in directly:

FFMPEG_BIN = "ffmpeg"
cwd = (os.getcwd())
dcwd = (cwd + "/temp")
if not os.path.exists(dcwd): os.makedirs(dcwd)

folder_path = sys.argv[1]
f = open("test.txt","a")

for f in glob.glob(os.path.join(folder_path, "*.mp3")):
    ff = f.replace("./", "/")
    print("Name: " + ff)
    aa = (cwd + ff)

    command = [ FFMPEG_BIN,
        '-i',  aa,
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', '22000', # ouput will have 44100 Hz
        '-ac', '1', # stereo (set to '1' for mono)
        '-']

    pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
    raw_audio = pipe.proc.stdout.read(88200*4)
    audio_array = numpy.fromstring(raw_audio, dtype="int16")
    print (str(audio_array))
    f.write(audio_array + "\n")

The problem is, when I run the file, it starts ffmpeg and then does nothing:

[mp3 @ 0x1446540] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/home/don/Code/Projects/MC/Music/Spaz.mp3':
  Metadata:
    title           : Spaz
    album           : Seeing souns
    artist          : N*E*R*D
    genre           : Hip-Hop
    encoder         : Audiograbber 1.83.01, LAME dll 3.96, 320 Kbit/s, Joint Stereo, Normal quality
    track           : 5/12
    date            : 2008
  Duration: 00:03:50.58, start: 0.000000, bitrate: 320 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Output #0, s16le, to 'pipe:':
  Metadata:
    title           : Spaz
    album           : Seeing souns
    artist          : N*E*R*D
    genre           : Hip-Hop
    date            : 2008
    track           : 5/12
    encoder         : Lavf56.4.101
    Stream #0:0: Audio: pcm_s16le, 22000 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc56.1.100 pcm_s16le
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help

It just sits there, hanging, for far longer than the song is. What am I doing wrong here?,

Cyanohydrin answered 4/7, 2016 at 21:54 Comment(4)

why are you 88200*4? – Harness 4/7, 2016 at 21:56

That's what the code sample said. – Cyanohydrin 4/7, 2016 at 22:15

Where is the code from? – Harness 4/7, 2016 at 22:16

Here: zulko.github.io/blog/2013/10/04/… – Cyanohydrin 4/7, 2016 at 22:26

I recommend you pymedia or audioread or decoder.py. There are also pyffmpeg and similar modules for doing just that what you want. Take a look at pypi.python.org.

Of course, these will not help you turn the data into numpy array.

Anyway, this is how it is done crudely using piping to ffmpeg:

from subprocess import Popen, PIPE
import numpy as np

def decode (fname):
    # If you are on Windows use full path to ffmpeg.exe
    cmd = ["./ffmpeg.exe", "-i", fname, "-f", "wav", "-"]
    # If you are on W add argument creationflags=0x8000000 to prevent another console window jumping out
    p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    data = p.communicate()[0]
    return np.fromstring(data[data.find("data")+4:], np.int16)

This is how it should work for basic use.

It should work because output of ffmpeg is by default 16 bit audio. But if you mess around, you should know that numpy doesn't have int24, so you will be forced to do some bit manipulations and represent 24 bit audio as 32 bit audio. Just, don't use 24 bit, and the world is happy. :D

We may discuss refining the code in comments, if you need something more sophisticated.

Conservation answered 4/7, 2016 at 22:35 Comment(19)

That seems to be what I need, but it gave me an EOFError in python 2.7, and an ImportError in python 3. – Cyanohydrin 4/7, 2016 at 22:54

Am I missing a requirement? – Cyanohydrin 4/7, 2016 at 22:54

No, except numpy, all is standard lib. I'll make a check now. – Conservation 4/7, 2016 at 22:57

OK, this code is checked and it works. I jumped over all that should be done properly to have nice code etc. because memory usage can really grow when flitting data around. I have no idea why ImportError should occur, perhaps some naming change in Python3. This works on Win, and if you want I can check it on Linux later. – Conservation 4/7, 2016 at 23:23

And I now have a 40.7 MB txt file. Thanks, you really helped me. – Cyanohydrin 4/7, 2016 at 23:47

Wait, the txt file is full of gibberish. I think I broke it somehow. – Cyanohydrin 5/7, 2016 at 0:3

What should I have gotten out? – Cyanohydrin 5/7, 2016 at 0:3

Out of decode() function, you get a numpy.ndarray() instance with dtype numpy.int16. How did you went about saving it to txt? – Conservation 5/7, 2016 at 0:35

I use f = open("test.wav","a") and f.write(a), but when I open it in gedit it's just "/FF/FF/FF/FF/00/00/00/00", and in nano it's "��^@^@^@^@^@^@^@^@^@^@^@" – Cyanohydrin 5/7, 2016 at 0:55

If you want to output the array into a file in human readable way use: np.savetxt("<file_name>.txt", decode("<input_file>.mp3")). You'll get a comma separated ints. Your file will be way over 40 MB. :D – Conservation 5/7, 2016 at 0:58

What you are doing now is just saving binary raw data back to a file. You said you need an array, why do you want to save it? – Conservation 5/7, 2016 at 1:7

Ok, I used np.savetxt("array.txt", a). It's at 500 MB and still growing. :( – Cyanohydrin 5/7, 2016 at 1:21

Ok, so: It stopped at 518.6 MB, but when I tried to open it my computer froze, I had to reboot. – Cyanohydrin 5/7, 2016 at 1:33

The second time, it worked. It's full of this:

4.832000000000000000e+03, 1.736800000000000000e+04, 3.851000000000000000e+03, 1.755400000000000000e+04, 3.134000000000000000e+03

. I guess that's what I wanted? – Cyanohydrin 5/7, 2016 at 1:35

OK, yes, but change the output format of np.savetxt so that it outputs clearly what you want. Read help(np.savetxt) to learn how to use formating in fmt argument. Default is float with *10**(something). Sorry, I forgot to mention. I think that fmt="%i" should be enough, but read the help anyway. Nano will not freeze, but you'll have to wait a bit. – Conservation 5/7, 2016 at 1:50

Update: now it's only 116 MB, and full of: "-373, 12658, 939, 16178, -797, 14072, -1943, 12372" Thanks, you really helped me out, a lot. – Cyanohydrin 5/7, 2016 at 3:40

That is OK. If you want to scale the signal to some interval for easier classification perform normalization on the array before saving it. Have in mind that when your audio is stereo, all integers in the array on even position index represent left channel, and all odd the right. You may have to separate them, depending on what classification you would like to employ. You can easily reshape the array to be 2D with one column representing left, and other the right channel. You may even have to turn all your signals to mono. – Conservation 5/7, 2016 at 17:8

Actually, I edited the command so it it mono: cmd = ["ffmpeg", "-i", fname, "-ss", "0", "-t", "120", "-ac", "1", "-ar", "22000", "-f", "wav", "-"] – Cyanohydrin 5/7, 2016 at 17:10

Bravo! Although channel separation and turning to mono in numpy is piece of cake it is better that way, because ffmpeg will also deal with mono compatibility by correctly applying M and S components. – Conservation 5/7, 2016 at 17:21

Here's what I'm using: It uses pydub (which uses ffmpeg) and scipy.

Full setup (on Mac, may differ on other systems):

pip install scipy
pip install pydub
brew install ffmpeg  # Or probably "sudo apt-get install ffmpeg on linux"

Then to read the mp3:

import tempfile
import os
import pydub
import scipy
import scipy.io.wavfile


def read_mp3(file_path, as_float = False):
    """
    Read an MP3 File into numpy data.
    :param file_path: String path to a file
    :param as_float: Cast data to float and normalize to [-1, 1]
    :return: Tuple(rate, data), where
        rate is an integer indicating samples/s
        data is an ndarray(n_samples, 2)[int16] if as_float = False
            otherwise ndarray(n_samples, 2)[float] in range [-1, 1]
    """

    path, ext = os.path.splitext(file_path)
    assert ext=='.mp3'
    mp3 = pydub.AudioSegment.from_mp3(file_path)
    _, path = tempfile.mkstemp()
    mp3.export(path, format="wav")
    rate, data = scipy.io.wavfile.read(path)
    os.remove(path)
    if as_float:
        data = data/(2**15)
    return rate, data

Credit to James Thompson's blog

Geniagenial answered 26/2, 2018 at 6:39 Comment(1)

You need os.close(_) (and probably rename _ to fd) to close the temp file descriptor. Otherwise, when run in a for loop you will eventually get [Errno 24] Too many open files. – Classroom 7/8, 2018 at 21:51

Recommended topics

Hot tags