get_array_of_samples (not found on [ReadTheDocs.AudioSegment]: audiosegment module) returns an 1 dimensional array, and that doesn't work well since it loses information about the audio stream (frames, channels, ...)
A couple of days ago, I ran into this problem, and as I used [PyPI]: sounddevice (which expects a numpy.ndarray) to play the sound (I needed to play it on different output audio devices). Here's what I came up with.
code00.py:
#!/usr/bin/env python
import sys
from pprint import pprint as pp
import numpy as np
import pydub
import sounddevice as sd
def audio_file_to_np_array(file_name):
asg = pydub.AudioSegment.from_file(file_name)
dtype = getattr(np, "int{:d}".format(asg.sample_width * 8)) # Or could create a mapping: {1: np.int8, 2: np.int16, 4: np.int32, 8: np.int64}
arr = np.ndarray((int(asg.frame_count()), asg.channels), buffer=asg.raw_data, dtype=dtype)
print("\n", asg.frame_rate, arr.shape, arr.dtype, arr.size, len(asg.raw_data), len(asg.get_array_of_samples())) # @TODO: Comment this line!!!
return arr, asg.frame_rate
def main(*argv):
pp(sd.query_devices()) # @TODO: Comment this line!!!
a, fr = audio_file_to_np_array("./test00.mp3")
dvc = 5 # Index of an OUTPUT device (from sd.query_devices() on YOUR machine)
#sd.default.device = dvc # Change default OUTPUT device
sd.play(a, samplerate=fr)
sd.wait()
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.")
sys.exit(rc)
Output:
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q038015319]> set PATH=%PATH%;f:\Install\pc064\FFMPEG\FFMPEG\4.3.1\bin
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q038015319]> dir /b
code00.py
test00.mp3
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q038015319]> "e:\Work\Dev\VEnvs\py_pc064_03.09.01_test0\Scripts\python.exe" code00.py
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] 064bit on win32
0 Microsoft Sound Mapper - Input, MME (2 in, 0 out)
> 1 Microphone (Logitech USB Headse, MME (2 in, 0 out)
2 Microphone (Realtek Audio), MME (2 in, 0 out)
3 Microsoft Sound Mapper - Output, MME (0 in, 2 out)
< 4 Speakers (Logitech USB Headset), MME (0 in, 2 out)
5 Speakers / Headphones (Realtek , MME (0 in, 2 out)
6 Primary Sound Capture Driver, Windows DirectSound (2 in, 0 out)
7 Microphone (Logitech USB Headset), Windows DirectSound (2 in, 0 out)
8 Microphone (Realtek Audio), Windows DirectSound (2 in, 0 out)
9 Primary Sound Driver, Windows DirectSound (0 in, 2 out)
10 Speakers (Logitech USB Headset), Windows DirectSound (0 in, 2 out)
11 Speakers / Headphones (Realtek Audio), Windows DirectSound (0 in, 2 out)
12 Realtek ASIO, ASIO (2 in, 2 out)
13 Speakers (Logitech USB Headset), Windows WASAPI (0 in, 2 out)
14 Speakers / Headphones (Realtek Audio), Windows WASAPI (0 in, 2 out)
15 Microphone (Logitech USB Headset), Windows WASAPI (1 in, 0 out)
16 Microphone (Realtek Audio), Windows WASAPI (2 in, 0 out)
17 Microphone (Realtek HD Audio Mic input), Windows WDM-KS (2 in, 0 out)
18 Speakers (Realtek HD Audio output), Windows WDM-KS (0 in, 2 out)
19 Stereo Mix (Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)
20 Microphone (Logitech USB Headset), Windows WDM-KS (1 in, 0 out)
21 Speakers (Logitech USB Headset), Windows WDM-KS (0 in, 2 out)
44100 (82191, 2) int16 164382 328764 164382
--- (Manually inserted line) Sound is playing :) ---
Done.
Notes:
As seen, there's no value hardcoded (in terms of dimensions, dtype, ...)
I also need to return the sample rate (as it can't be in embedded the array), and it's required by the device (in this case it's 44.1k which is the default - but I've tested with files having half that value)
All the existing answers use float to represent a sample. That doesn't work for me, as for most of the test files the sample rate is 16bit long, and np.float16 is not supported (by my FPU), so I had to use int
As a side note, when testing on various files, an .m4a could not be played on my Win laptop by SoundDevice (most likely because a 32k sample rate), but PyDub was able to