reading a WAV file from TIMIT database in python
Asked Answered
P

6

14

I'm trying to read a wav file from the TIMIT database in python but I get an error:

When I'm using wave:

wave.Error: file does not start with RIFF id

When I'm using scipy:

ValueError: File format b'NIST'... not understood.

and when I'm using librosa, the program got stuck. I tried to convert it to wav using sox:

cmd = "sox " + wav_file + " -t wav " + new_wav
subprocess.call(cmd, shell=True)

and it didn't help. I saw an old answer referencing to the package scikits.audiolab but it looks like it is no longer supported.

How can I read these file to get a ndarray of the data?

Thanks

Presnell answered 25/6, 2017 at 16:19 Comment(1)
You could try reading the file with the soundfile module or any of the other libsndfile wrappers, which should support the NIST format.Trichloromethane
B
8

Your file is not a WAV file. Apparently it is a NIST SPHERE file. From the LDC web page: "Many LDC corpora contain speech files in NIST SPHERE format." According to the description of the NIST File Format, the first four characters of the file are NIST. That's what the scipy error is telling you: it doesn't know how to read a file that begins with NIST.

I suspect you'll have to convert the file to WAV if you want to read the file with any of the libraries that you tried. To force the conversion to WAV using the program sph2pipe, use the command option -f wav (or equivalently, -f rif), e.g.

sph2pipe -f wav input.sph output.wav
Berk answered 26/6, 2017 at 0:37 Comment(2)
I updated my answer with a note about using -f wav.Berk
An easy way to run this over all files under the current directory recursively is find . -name '*.WAV' -exec sph2pipe -f wav {} {}.wav \;. The only drawback is that you end up with files ending with .WAV.wav.Haul
B
3

issue this from command line to verify its a wav file ... or not

xxd -b myaudiofile.wav | head

if its wav format it will appear something like

00000000: 01010010 01001001 01000110 01000110 10111100 10101111  RIFF..
00000006: 00000001 00000000 01010111 01000001 01010110 01000101  ..WAVE
0000000c: 01100110 01101101 01110100 00100000 00010000 00000000  fmt ..
00000012: 00000000 00000000 00000001 00000000 00000001 00000000  ......
00000018: 01000000 00011111 00000000 00000000 01000000 00011111  @...@.
0000001e: 00000000 00000000 00000001 00000000 00001000 00000000  ......
00000024: 01100100 01100001 01110100 01100001 10011000 10101111  data..
0000002a: 00000001 00000000 10000001 10000000 10000001 10000000  ......
00000030: 10000001 10000000 10000001 10000000 10000001 10000000  ......
00000036: 10000001 10000000 10000001 10000000 10000001 10000000  ......

here is yet another way to display contents of a binary file like a WAV

od -A x -t x1z -v  audio_util_test_file_custom.wav   | head 
000000 52 49 46 46 24 80 00 00 57 41 56 45 66 6d 74 20  >RIFF$...WAVEfmt <
000010 10 00 00 00 01 00 01 00 44 ac 00 00 88 58 01 00  >........D....X..<
000020 02 00 10 00 64 61 74 61 00 80 00 00 00 00 78 05  >....data......x.<
000030 ed 0a 5e 10 c6 15 25 1b 77 20 ba 25 eb 2a 08 30  >..^...%.w .%.*.0<
000040 0e 35 fc 39 cf 3e 84 43 1a 48 8e 4c de 50 08 55  >.5.9.>.C.H.L.P.U<
000050 0b 59 e4 5c 91 60 12 64 63 67 85 6a 74 6d 30 70  >.Y.\.`.dcg.jtm0p<
000060 b8 72 0a 75 25 77 09 79 b4 7a 26 7c 5d 7d 5a 7e  >.r.u%w.y.z&|]}Z~<
000070 1c 7f a3 7f ee 7f fd 7f d0 7f 67 7f c3 7e e3 7d  >..........g..~.}<
000080 c9 7c 74 7b e6 79 1e 78 1f 76 e8 73 7b 71 d9 6e  >.|t{.y.x.v.s{q.n<
000090 03 6c fa 68 c1 65 57 62 c0 5e fd 5a 0f 57 f8 52  >.l.h.eWb.^.Z.W.R<

notice the wav file begins with the characters RIFF which is the mandatory indicator the file is using wav codec ... if your system (I'm on linux) does not have above command line utility : xxd then use any hex editor like wxHexEditor to similarily examine your wav file to confirm you see the RIFF ... if no RIFF then its simply not a wav file

Here are details of wav format specs

http://soundfile.sapp.org/doc/WaveFormat/

http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

http://unusedino.de/ec64/technical/formats/wav.html

http://www.drdobbs.com/database/inside-the-riff-specification/184409308

https://www.gamedev.net/articles/programming/general-and-gameplay-programming/loading-a-wave-file-r709

http://www.topherlee.com/software/pcm-tut-wavformat.html

http://www.labbookpages.co.uk/audio/javaWavFiles.html

http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html

http://nagasm.org/ASL/sound05/

Blessington answered 25/6, 2017 at 20:47 Comment(0)
S
2

If you want a generic code that works for every wav file inside the folder run:

forfiles /s /m *.wav /c "cmd /c sph2pipe -f wav @file @fnameRIFF.wav"

It search for every wav file that can find and create a wav file that both scipy and wave can read with the name < base_name >RIFF.wav

Surrey answered 18/11, 2017 at 21:38 Comment(2)
This complements with Warren Weckesser solution of sph2pipe... I would have put it as a comment but I don't have the reputation needed yet.Surrey
find . -name '*.WAV' -exec sph2pipe -f wav {} {}.wav \; if you don't want to install forfiles.Haul
I
2

Please use sounddevice and soundfile to obtain the numpy array data (and playback) using the following code:

import matplotlib.pyplot as plt
import soundfile as sf
import sounddevice as sd
# https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav
data, fs = sf.read('LDC93S1.wav')
print(data.shape,fs)
sd.play(data, fs, blocking=True)
plt.plot(data)
plt.show()

Output

(46797,) 16000

enter image description here

A sample TIMIT database wav file: https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav

Interplay answered 10/2, 2021 at 7:41 Comment(0)
K
1

I have written a python script which will convert all the .WAV files in NIST format spoken by all speakers from all dialects to .wav files which ca n be played on your system.

Note: All the dialects folders are present in ./TIMIT/TRAIN/ . You may have to change the dialects_path according to your project structure(or if you are on Windows)

from sphfile import SPHFile

dialects_path = "./TIMIT/TRAIN/"

for dialect in dialects:
    dialect_path = dialects_path + dialect
    speakers = os.listdir(path = dialect_path)
    for speaker in speakers:
        speaker_path =  os.path.join(dialect_path,speaker)        
        speaker_recordings = os.listdir(path = speaker_path)

        wav_files = glob.glob(speaker_path + '/*.WAV')

        for wav_file in wav_files:
            sph = SPHFile(wav_file)
            txt_file = ""
            txt_file = wav_file[:-3] + "TXT"

            f = open(txt_file,'r')
            for line in f:
                words = line.split(" ")
                start_time = (int(words[0])/16000)
                end_time = (int(words[1])/16000)
            print("writing file ", wav_file)
            sph.write_wav(wav_file.replace(".WAV",".wav"),start_time,end_time)    
Kenelm answered 27/3, 2019 at 3:12 Comment(0)
C
0

Sometimes this can be caused by the incorrect method of extracting a 7zip file. I had a similar issue. I sorted out this issue by extracting the dataset using 7z x <datasetname>.7z

Chromatology answered 2/7, 2021 at 20:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.