I'm currently extracting mel features from my baby cry sound dataset and the wav files' sampling rate is 8kHz, 16bit, mono and about 7 sec.
Mel-Spectogram when sr = 16000 Mel-Spectogram when sr = 44100
But as you can see, whenever I extract features with different sampling rates sr
, the values of the mel-spectrogram change.
I thought that since the wav file's sampling rate is 8kHz, if I set the sampling rate to over 16kHz the value of Hertz must be same.
I converted wav file's sampling rate 8kHz to 44.1kHz and extracted it again but nothing changes.
This is my code:
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
sr = 44100 # or 16000
frame_length = 0.1
frame_stride = 0.01
path = '...'
train = []
j, sr = librosa.load(path + '001.wav', sr, duration = 5.0)
input_nfft = int(round(sr*frame_length))
input_stride = int(round(sr*frame_stride))
mel = librosa.feature.melspectrogram(j, n_mels = 128, n_fft = input_nfft, hop_length=input_stride, sr = sr)
train.append(mel)
plt.figure(figsize=(10,4))
librosa.display.specshow(librosa.power_to_db(train[0], ref=np.max), y_axis='mel', sr=sr, hop_length=input_stride, x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel-Spectrogram')
plt.tight_layout()
plt.show()
The value of y-axis must be the same whatever sr = 44100
or 16000
but I don't understand why it happens.