Determining Bit-Depth of a wav file
Asked Answered
T

3

7

I am looking for a fast, preferably standard library mechanism to determine the bit-depth of wav file e.g. '16-bit' or '24-bit'.

I am using a subprocess call to Sox to get a plethora of audio metadata but a subprocess call is very slow and the only information I can only currently get reliably from Sox is the bit-depth.

The built in wave module does not have a function like "getbitdepth()" and is also not compatible with 24bit wav files - I could use a 'try except' to access the files metadata using the wave module (if it works, manually record that it is 16bit) then on except call sox instead (where sox will perform the analysis to accurately record its bitdepth). My concern is that that this approach feels like guess work. What if a an 8bit file is read? I would be manually assigning 16-bit when it is not.

SciPy.io.wavefile also is not compatible with 24bit audio so creates a similar issue.

This tutorial is really interesting and even includes some really low level (low level for Python at least) scripting examples to extract information from the wav files headers - unfortunately these scripts don't work for 16-bit audio.

Is there any way to simply (and without calling sox) determine what bit-depth the wav file I'm checking has?

The wave header parser script I'm using is as follows:

import struct
import os

def print_wave_header(f):
    '''
    Function takes an audio file path as a parameter and 
    returns a dictionary of metadata parsed from the header
    '''
    r = {} #the results of the header parse
    r['path'] = f
    fin = open(f,"rb") # Read wav file, "r flag" - read, "b flag" - binary 
    ChunkID=fin.read(4) # First four bytes are ChunkID which must be "RIFF" in ASCII
    r["ChunkID"]=ChunkID
    ChunkSizeString=fin.read(4) # Total Size of File in Bytes - 8 Bytes
    ChunkSize=struct.unpack('I',ChunkSizeString) # 'I' Format is to to treat the 4 bytes as unsigned 32-bit inter
    TotalSize=ChunkSize[0]+8 # The subscript is used because struct unpack returns everything as tuple
    r["TotalSize"]=TotalSize
    DataSize=TotalSize-44 # This is the number of bytes of data
    r["DataSize"]=DataSize
    Format=fin.read(4) # "WAVE" in ASCII
    r["Format"]=Format
    SubChunk1ID=fin.read(4) # "fmt " in ASCII
    r["SubChunk1ID"]=SubChunk1ID
    SubChunk1SizeString=fin.read(4) # Should be 16 (PCM, Pulse Code Modulation)
    SubChunk1Size=struct.unpack("I",SubChunk1SizeString) # 'I' format to treat as unsigned 32-bit integer
    r["SubChunk1Size"]=SubChunk1Size
    AudioFormatString=fin.read(2) # Should be 1 (PCM)
    AudioFormat=struct.unpack("H",AudioFormatString) ## 'H' format to treat as unsigned 16-bit integer
    r["AudioFormat"]=AudioFormat[0]
    NumChannelsString=fin.read(2) # Should be 1 for mono, 2 for stereo
    NumChannels=struct.unpack("H",NumChannelsString) # 'H' unsigned 16-bit integer
    r["NumChannels"]=NumChannels[0]
    SampleRateString=fin.read(4) # Should be 44100 (CD sampling rate)
    SampleRate=struct.unpack("I",SampleRateString)
    r["SampleRate"]=SampleRate[0]
    ByteRateString=fin.read(4) # 44100*NumChan*2 (88200 - Mono, 176400 - Stereo)
    ByteRate=struct.unpack("I",ByteRateString) # 'I' unsigned 32 bit integer
    r["ByteRate"]=ByteRate[0]
    BlockAlignString=fin.read(2) # NumChan*2 (2 - Mono, 4 - Stereo)
    BlockAlign=struct.unpack("H",BlockAlignString) # 'H' unsigned 16-bit integer
    r["BlockAlign"]=BlockAlign[0]
    BitsPerSampleString=fin.read(2) # 16 (CD has 16-bits per sample for each channel)
    BitsPerSample=struct.unpack("H",BitsPerSampleString) # 'H' unsigned 16-bit integer
    r["BitsPerSample"]=BitsPerSample[0]
    SubChunk2ID=fin.read(4) # "data" in ASCII
    r["SubChunk2ID"]=SubChunk2ID
    SubChunk2SizeString=fin.read(4) # Number of Data Bytes, Same as DataSize
    SubChunk2Size=struct.unpack("I",SubChunk2SizeString)
    r["SubChunk2Size"]=SubChunk2Size[0]
    S1String=fin.read(2) # Read first data, number between -32768 and 32767
    S1=struct.unpack("h",S1String)
    r["S1"]=S1[0]
    S2String=fin.read(2) # Read second data, number between -32768 and 32767
    S2=struct.unpack("h",S2String)
    r["S2"]=S2[0]
    S3String=fin.read(2) # Read second data, number between -32768 and 32767
    S3=struct.unpack("h",S3String)
    r["S3"]=S3[0]
    S4String=fin.read(2) # Read second data, number between -32768 and 32767
    S4=struct.unpack("h",S4String)
    r["S4"]=S4[0]
    S5String=fin.read(2) # Read second data, number between -32768 and 32767
    S5=struct.unpack("h",S5String)
    r["S5"]=S5[0]
    fin.close()
    return r
Testes answered 13/9, 2017 at 17:59 Comment(5)
every wav file has bit_depth in its header (the first 44 bytes) ... every wav library must parse the header ... its quite easy to perform this header parse yourselfVirtues
Using the tutorial I flagged in the example I was already able to parse the header but the bit-depth was not always clear e.g. ChunkID= b'RIFF' TotalSize= 602914 DataSize= 602870 Format= b'WAVE' SubChunk1ID= b'JUNK' SubChunk1Size= 92 AudioFormat= 0 NumChannels= 0 SampleRate= 0 ByteRate= 0 BlockAlign= 0 BitsPerSample= 0 SubChunk2ID= b'\x00\x00\x00\x00' SubChunk2Size= 0 S1= 0 S2= 0 S3= 0 S4= 0 S5= 0 Depending on the file compression the header is readable or not but I want to be able to read it regardless of the file format/compression without any conversion process.Testes
its a red flag to see 0 for all those header settings - either the file is corrupt or the library is wrong ... even if the wav file is compressed (I have never seen compression on wav files) the header certainly will NOT be compressed ... here is a concise wav spec summary soundfile.sapp.org/doc/WaveFormat ... if you write your own header parser pay particular attention to endianness of both header fields and data section ... you can write your own wav parser in two pages of codeVirtues
Thanks - the link you posted is a fantastic source of info. Will take some time to digest it. When you say "either the file is corrupt or the library is wrong", the file plays just fine so I don't it's corrupt. What library were you referring to when you said it might be wrong? I'll add the parser I'm using to the question.Testes
Some WAV files have a JUNK chunk that is apparently meant to align RIFF chunks to certain boundaries (daubnet.com/en/file-format-riff). This JUNK chunk comes immediately after the WAV bytes and before the fmt bytes, so if you're expecting fixed byte offsets that could cause some of the 0 values you were seeing.Rebeccarebecka
H
7

I highly recommend the soundfile module (but mind you, I'm very biased because I wrote a large part of it).

There you can open your file as a soundfile.SoundFile object, which has a subtype attribute that holds the information you are looking for.

In your case that would probably be 'PCM_16' or 'PCM_24'.

Helicline answered 15/9, 2017 at 13:54 Comment(4)
I will try this. Do you know if soundfile objects can be instantiated asynchronously?Testes
Can you clarify what you mean by "asynchronous"? If you mean if any functions (I guess you are talking about the constructor) are await-able, then no. Are there any sound file modules that support that?Helicline
You understood correctly. I'm not aware of any available audio modules which are awaitable -if soundfile is fast enough then maybe I can try to work without async.Testes
Creating a SoundFile object from an audio file needs only to read a few dozen header bytes (in the simplest WAV case), so it should be fast, I guess. If the access to your file data is slow, you could try to use some kind of pre-buffered file-like object and await on that. The soundfile module can deal with file-like objects.Helicline
C
10

Esentially the same answer as from Matthias, but with copy-pastable code.

Requirements

pip install soundfile

Code

import soundfile as sf

ob = sf.SoundFile('example.wav')
print('Sample rate: {}'.format(ob.samplerate))
print('Channels: {}'.format(ob.channels))
print('Subtype: {}'.format(ob.subtype))

Explanation

  • Channels: Usually 2, meaning you have one left speaker and one right speaker.
  • Sample rate: Audio signals are analog, but we want to represent them digitally. Meaning we want to discretize them in value and in time. The sample rate gives how many times per second we get a value. The unit is Hz. The sample rate needs to be at least double of the highest frequency in the original sound, otherwise you get aliasing. Human hearing range goes from ~20Hz to ~20kHz, so you can cut off anything above 20kHZ. Meaning a sample rate of more than 40kHz does not make much sense.
  • Bit-depth: The higher the bit-depth, the more dynamic range can be captured. Dynamic range is the difference between the quietest and loudest volume of an instrument, part or piece of music. A typical value seems to be 16 bit or 24 bit. A bit-depth of 16 bit has a theoretical dynamic range of 96 dB, whereas 24 bit has a dynamic range of 144 dB (source).
  • Subtype: PCM_16 means 16 bit depth, where PCM stands for Pulse-Code Modulation.

Alternative

If you only look for a command line tool, then I can recommend MediaInfo:

$ mediainfo example.wav
General
Complete name                            : example.wav
Format                                   : Wave
File size                                : 83.2 MiB
Duration                                 : 8 min 14 s
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 411 kb/s

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 8 min 14 s
Bit rate mode                            : Constant
Bit rate                                 : 1 411.2 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 kHz
Bit depth                                : 16 bits
Stream size                              : 83.2 MiB (100%)
Compote answered 13/1, 2019 at 8:57 Comment(0)
H
7

I highly recommend the soundfile module (but mind you, I'm very biased because I wrote a large part of it).

There you can open your file as a soundfile.SoundFile object, which has a subtype attribute that holds the information you are looking for.

In your case that would probably be 'PCM_16' or 'PCM_24'.

Helicline answered 15/9, 2017 at 13:54 Comment(4)
I will try this. Do you know if soundfile objects can be instantiated asynchronously?Testes
Can you clarify what you mean by "asynchronous"? If you mean if any functions (I guess you are talking about the constructor) are await-able, then no. Are there any sound file modules that support that?Helicline
You understood correctly. I'm not aware of any available audio modules which are awaitable -if soundfile is fast enough then maybe I can try to work without async.Testes
Creating a SoundFile object from an audio file needs only to read a few dozen header bytes (in the simplest WAV case), so it should be fast, I guess. If the access to your file data is slow, you could try to use some kind of pre-buffered file-like object and await on that. The soundfile module can deal with file-like objects.Helicline
A
1

Not clear when this update went out but the built in wave module appears to be compatible with 24 bit wav files. I'm using python 3.10.5

The wave_read sampwidth() method states that it returns bytes. I'm fairly sure just taking this value and multiplying by 8 will give us bit depth. For example:

with wave.open(path, 'rb') as wav:
        bit_depth = wav.getsampwidth() * 8

getsampwidth() returns 2 for a 16 bit file and 3 for a 24 bit. No additional modules or subprocesses needed!

Array answered 9/8, 2022 at 16:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.