Convert audio files for CMU Sphinx 4 input
Asked Answered
S

1

8

I have a big batch of files I'd like to run recognition on using CMU Sphinx 4. Sphinx requires the following format:

  • 16 khz
  • 16 bit
  • mono
  • little-endian

My files are something like 44100 khz, 32 bit stereo mp3 files. I tried using Tritonus, and then its updated version JavaZoom, to convert using code from bakuzen. However, AudioSystem.getAudioInputStream(File) throws an UnsupportedAudioFileException, and I haven't been able to figure out why, so I have moved on.

Now I am trying ffmpeg. The command ffmpeg -i input.mp3 -ac 1 -ab 16 -ar 16000 output.wav seems like it should do the trick (except for little endian), but when I check the output with Audacity, it still labels it as "32-bit float". The command I found on this site also uses -acodec pcm_s16le, which from its name seems to be outputting 16 bit little endian; however, Audacity still tells me the output is 32 bit float.

Can anyone tell me how to convert audio files into the format required by CMU Sphinx 4?

Subdeacon answered 3/12, 2012 at 22:36 Comment(0)
T
21

Did you actually try the output from ffmpeg in CMU Sphinx 4? 32-bit float is probably your default sampling format in Audacity (Edit > Preferences > Quality). I'm guessing it converts any imported file to these settings, so it may not be reporting the parameters of the actual file, but perhaps the working file in Audacity.

Remove -ab 16. This would instruct the encoder to use 16 bits/s and ffmpeg will ignore it for pcm_s16le anyway. So your command will look like:

ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav

To convert all mp3 files in a directory in Linux:

for f in *.mp3; do ffmpeg -i "$f" -acodec pcm_s16le -ac 1 -ar 16000 "${f%.mp3}.wav"; done

Or Windows:

for /r %i in (*) do ffmpeg -i %i -acodec pcm_s16le -ac 1 -ar 16000 %i.wav

In Windows Batch file:

for /r %%i in (*.mp3) do ffmpeg -i "%%i" -acodec pcm_s16le -ac 1 -ar 16000 "%i.wav"

You can see file information with file, ffmpeg, ffprobe, mediainfo among other utilities:

$ file hjl0bC.wav 
hjl0bC.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

$ ffmpeg -i hjl0bC.wav
[...]
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Tyus answered 4/12, 2012 at 1:46 Comment(4)
Thank you, this appears to be the correct format. My output files still do not run with Sphinx 4, however. May have to ask @Nikolay Shmyrev directly...Subdeacon
The format was right. My file just had zero energy level regions, so once I added dither into the frontend everything worked great.Subdeacon
@NateGlenn I added your edit that was rejected by other users. I'm not a Windows user, so I didn't test it.Tyus
Thanks. I guess if my edits are being rejected that I need to review editing policy.Subdeacon

© 2022 - 2024 — McMap. All rights reserved.