How to generate MFCC Algorithm's triangular windows and how to use them?
Asked Answered
M

2

9

I am implementing MFCC algorithm in Java.

There is a sample code here: http://www.ee.columbia.edu/~dpwe/muscontent/practical/mfcc.m at Matlab. However I have some problems with mel filter banking process. How to generate triangular windows and how to use them?

PS1: An article which has a part that describes MFCC: http://arxiv.org/pdf/1003.4083

PS2: If there is a document about MFCC algorithms steps basically, it will be good.

PS3: My main question is related to that: MFCC with Java Linear and Logarithmic Filters some implementations use both linear and logarithmic filter and some of them not. What is that filters and what is the center frequent concept. I follow that code:MFCC Java , what is the difference of it between that code: MFCC Matlab

Mazonson answered 20/5, 2011 at 20:55 Comment(1)
If you do Matlab at all, VoiceBox toolbox has Matlab code to do that. Perhaps you can port it.Clearness
N
4

Triangular windows as frequency band filters aren't hard to implement. You basically want to integrate the FFT data within each band (defined as the frequency space between center frequency i-1 and center frequency i+1).

You're basically looking for something like,

for(int bandIdx = 0; bandIdx < numBands; bandIdx++) {
    int startFreqIdx  = centerFreqs[bandIdx-1];
    int centerFreqIdx = centerFreqs[bandIdx];
    int stopFreqIdx   = centerFreqs[bandIdx+1];

    for(int freq = startFreqIdx; i < centerFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-startFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-startFreqIdx)/magnitudeScale;
    }

    for(int freq = centerFreqIdx; i <= stopFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-stopFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-stopFreqIdx)/magnitudeScale;
    }
}

If you do not understand the concept of a "center frequency" or a "band" or a "filter," pick up an elementary signals textbook--you shouldn't be implementing this algorithm without understanding what it does.

As for what the exact center frequencies are, it's up to you. Experiment and pick (or find in publications) values that capture the information you want to isolate from the data. The reason that there are no definitive values, or even scale for values, is because this algorithm tries to approximate a human ear, which is a very complicated listening device. Whereas one scale may work better for, say, speech, another may work better for music, etc. It's up to you to choose what is appropriate.

Neurath answered 3/6, 2011 at 23:27 Comment(10)
MFCC Java says: mel[0] = freqToMel(lowerFilterFreq); mel[1] = freqToMel(samplingRate / 2); What does it meani does it mean it runs the filter from lowerFilterFreq to samplingRate / 2 ? If I want to scan it till 1000 Hz should I write 1000 instead of samplingRate / 2 ?Mazonson
@kamaci: Without knowing the context or the details of freqToMel, it's impossible to say. I'm assuming samplingRate means the sampling rate. That is, if you're sampling audio at 44.1kHz, then samplingRate/2 is 22.05kHz. If you're sampling at 1000 Hz, then samplingRate/2 is 500 Hz. If you're unclear about the concept of "samplig rate", you should pick up a signals book like I said before.Neurath
Isn't samplingRate/2 is max frequency?Mazonson
samplingRate/2 is the Nyquist frequency. Any frequency above samplingRate/2 will be folded to be be between 0 and samplingRate/2Neurath
One more question and it is so so important for me. My lecturer wants that: Linear filters should be until 1000 Hz. (I think it is a linear filter that MFCC Java uses) How can I implement it to the code. I wrote 1000 Hz instead of samplingRate / 2 at code?Mazonson
MFCC is not a filter. MFCC is a spectrum analysis algorithm that uses filters. As far as I can tell, this requirement means that the center frequencies of the individual filters need to be linearly spaced up until 1000 Hz, then exponentially (or some other function) after that. For example, you could have center frequencies at every 100Hz until 1kHz, then 2kHz, 4kHz, 8kHz, etc. However, asking for clarification from your professor would be far more effective than asking on a programming board.Neurath
I know it is not a filter however can you tell what is going on ta centerFreq() method?Mazonson
@Thomas Minor, you say MFCC is a spectrum analysis algorithm that uses filters so how can I limit this filter until 1000 Hz at that Java code. Can you give a code example how to modify that code and limit it until 1000 Hz? Is that Java code using a linear filter?Mazonson
I have a question, and it might be a dumb one, but here goes: is it possible for these triangles to overlap? From what you've described it sounds like it's not, but I always try to check corner cases. Also, can the leftmost triangle extend into the negative frequency region?Dormer
@ThomasMinor I am sorry to point out the typo but is "i" in the inner loops is "freq" or "bandIdx"?Aggappera
M
3

Answer for the second PS: I found this tutorial that really helped me computing the MFCCs.

As for the triangular windows and the filterbanks, from what I understood, they do overlap, they do not extend to negative frequences and the whole process of computing them from the FFT spectrum and applying them back to it goes something like this:

  1. Choose a minimum and a maximum frequency for the filters (for example, min freq = 300Hz - the minimum voice frequency and max frequency = your sample rate / 2. Maybe this is where you should choose the 1000Hz limit you were talking about)
  2. Compute the mel values from the min and max chosen frequences. Formula here.
  3. Compute N equally distanced values between these two mel values. (I've seen examples of different values for N, you can even find a efficiency comparison for different of values in this work, for my tests I've picked 26)
  4. Convert these values back to Hz. (you can find the formula on the same wiki page) => array of N + 2 filter values
  5. Compute a filterbank (filter triangle) for each three consecutive values, either how Thomas suggested above (being careful with the indexes) or like in the turorial recommended at the beginning of this post) => an array of arrays, size NxM, asuming your FFT returned 2*M values and you only use M.
  6. Pass the whole power spectrum (M values obtained from FFT) through each triangular filter to get a "filterbank energy" for each filter (for each filterbank (N loop), multiply each magnitude obtained after FFT to each value in the corresponding filterbank (M loop) and add the M obtained values) => N-sized array of energies.

These are your filterbank energies that you can further apply a log to, apply the DCT and extract the MFCCs...

Mephitis answered 22/3, 2013 at 15:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.