Why do MFCC extraction libs return different values?

Asked 31/8, 2018 at 9:14 Answered 22/10, 2018 at 20:59

Solved python voice-recognition voice speech mfcc

I am extracting the MFCC features using two different libraries:

The python_speech_features lib
The BOB lib

However the output of the two is different and even the shapes are not the same. Is that normal? or is there a parameter that I am missing?

The relevant section of my code is the following:

import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank

def bob_extract_features(audio, rate):
    #get MFCC
    rate              = 8000  # rate
    win_length_ms     = 30    # The window length of the cepstral analysis in milliseconds
    win_shift_ms      = 10    # The window shift of the cepstral analysis in milliseconds
    n_filters         = 26    # The number of filter bands
    n_ceps            = 13    # The number of cepstral coefficients
    f_min             = 0.    # The minimal frequency of the filter bank
    f_max             = 4000. # The maximal frequency of the filter bank
    delta_win         = 2     # The integer delta value used for computing the first and second order derivatives
    pre_emphasis_coef = 0.97  # The coefficient used for the pre-emphasis
    dct_norm          = True  # A factor by which the cepstral coefficients are multiplied
    mel_scale         = True  # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale

    c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
                    f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
    c.with_delta       = False
    c.with_delta_delta = False
    c.with_energy      = False

    signal = np.cast['float'](audio)           # vector should be in **float**
    example_mfcc = c(signal)                   # mfcc + mfcc' + mfcc''
    return  example_mfcc


def psf_extract_features(audio, rate):
    signal = np.cast['float'](audio) #vector should be in **float**
    mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
                        nfilt = 26, nfft = 512,appendEnergy = False)

    #mfcc_feature = preprocessing.scale(mfcc_feature)
    deltas       = delta(mfcc_feature, 2)
    fbank_feat   = logfbank(audio, rate)
    combined     = np.hstack((mfcc_feature, deltas))
    return mfcc_feature



track = 'test-sample.wav'
rate, audio = read(track)

features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)

print("--------------------------------------------")
t = (features1 == features2)
print(t)

Raphael answered 31/8, 2018 at 9:14 Comment(0)

However the output of the two is different and even the shapes are not the same. Is that normal?

Yes, there are different varieties of the algorithm and each implementation choose its own flavor

or is there a parameter that I am missing?

It is not just about parameters, there are algorithmic differences too like window shape (hamming vs hanning), shape of mel filters, starts of mel filters, normalization of mel filters, liftering, dct flavor and so on and so forth.

If you want same results just use the single library for extraction, it is pretty hopeless to sync them.

Wolver answered 22/10, 2018 at 20:59 Comment(0)

Have you tried comparing the two with some tolerance? I believe the two MFCCs are arrays of floating point numbers, and testing for exact equality might not be wise. Try using numpy.testing.assert_allclose with some tolerance, and decide if the tolerance is good enough.

Nevertheless, I missed you saying that even the shapes mismatch, and I am not experienced with bob.ap to comment on that confidently. However, there's often the case that some libraries pad the input with zeros either in the beginning or the end of the input array for windowing reasons, and that may be responsible if one of these is doing it differently.

Fluxmeter answered 3/10, 2018 at 8:42 Comment(1)

Not part of the answer but, if you are looking around for libraries for MFCCs, librosa may also be an option for you. – Fluxmeter 3/10, 2018 at 8:52

Recommended topics

Hot tags