generating correct spectrogram using fftw and window function

Asked 22/1, 2014 at 12:35 Answered 1/2, 2014 at 12:14

For a project I need to be able to generate a spectrogram from a .WAV file. I've read the following should be done:

Get N (transform size) samples
Apply a window function
Do a Fast Fourier Transform using the samples
Normalise the output
Generate spectrogram

On the image below you see two spectrograms of a 10000 Hz sine wave both using the hanning window function. On the left you see a spectrogram generated by audacity and on the right my version. As you can see my version has a lot more lines/noise. Is this leakage in different bins? How would I get a clear image like the one audacity generates. Should I do some post-processing? I have not yet done any normalisation because do not fully understand how to do so.

enter image description here

update

I found this tutorial explaining how to generate a spectrogram in c++. I compiled the source to see what differences I could find.

My math is very rusty to be honest so I'm not sure what the normalisation does here:

    for(i = 0; i < half; i++){
        out[i][0] *= (2./transform_size);
        out[i][6] *= (2./transform_size);
        processed[i] = out[i][0]*out[i][0] + out[i][7]*out[i][8];
        //sets values between 0 and 1?
        processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
    }

after doing this I got this image (btw I've inverted the colors):

enter image description here

I then took a look at difference of the input samples provided by my sound library and the one of the tutorial. Mine were way higher so I manually normalised is by dividing it by the factor 32767.9. I then go this image which looks pretty ok I think. But dividing it by this number seems wrong. And I would like to see a different solution.

enter image description here

Here is the full relevant source code.

void Spectrogram::process(){
    int i;
    int transform_size = 1024;
    int half = transform_size/2;
    int step_size = transform_size/2;
    double in[transform_size];
    double processed[half];
    fftw_complex *out;
    fftw_plan p;

    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);


    for(int x=0; x < wavFile->getSamples()/step_size; x++){

        int j = 0;
        for(i = step_size*x; i < (x * step_size) + transform_size - 1; i++, j++){
            in[j] = wavFile->getSample(i)/32767.9;
        }

        //apply window function
        for(i = 0; i < transform_size; i++){
            in[i] *= windowHanning(i, transform_size);
//            in[i] *= windowBlackmanHarris(i, transform_size);
        }

        p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);

        fftw_execute(p); /* repeat as needed */

        for(i = 0; i < half; i++){
            out[i][0] *= (2./transform_size);
            out[i][11] *= (2./transform_size);
            processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13];
            processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
        }

        for (i = 0; i < half; i++){
            if(processed[i] > 0.99)
                processed[i] = 1;
            In->setPixel(x,(half-1)-i,processed[i]*255);
        }


    }


    fftw_destroy_plan(p);
    fftw_free(out);
}

Cookie answered 22/1, 2014 at 12:35 Comment(7)

You may check the zero frequency, the first item in the array out[0]. It represents the average of your signal. If it's different from the value you expect, that may be because of the fftw definition. It may multiplied by transform_size. – Paule 22/1, 2014 at 19:9

@Paule This wouldn't affect the entire spectrogram would it? only the zero frequency – Cookie 27/1, 2014 at 14:47

Have you had a look at the audacity source code? It is pretty organized if I recall correctly. – Popular 27/1, 2014 at 21:5

@Boedy: The FFT is linear: a multiplier in the time domain leads to the exact same multiplier in the frequency domain. The 0 bin is the DC offset. (+ instead of *) – Bouncy 27/1, 2014 at 21:12

BTW, have a peek at the Audacity Analyze Spectrum view, it shows the effect of a few different window functions. – Bouncy 27/1, 2014 at 21:21

The source you posted will only create a single fft (ie one vertical line in the picture). How do you get the next line? What is your preprocessing (the window function should be implicit, so what do you do before the posted loop?) and could you post the part of your code that normalizes the spectrum? The fft should give you a smooth picture. No post-processing needed - that is if you did not make any mistakes. In fact, your function seems to be periodic with transform_size, so every fft should be called on identical data. – Perhaps 28/1, 2014 at 19:43

I've update my post with my new findings and full source code. – Cookie 29/1, 2014 at 10:47

This is not exactly an answer as to what is wrong but rather a step by step procedure to debug this.

What do you think this line does? processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13] Likely that is incorrect: fftw_complex is typedef double fftw_complex[2], so you only have out[i][0] and out[i][1], where the first is the real and the second the imaginary part of the result for that bin. If the array is contiguous in memory (which it is), then out[i][12] is likely the same as out[i+6][0] and so forth. Some of these will go past the end of the array, adding random values.
Is your window function correct? Print out windowHanning(i, transform_size) for every i and compare with a reference version (for example numpy.hanning or the matlab equivalent). This is the most likely cause, what you see looks like a bad window function, kind of.
Print out processed, and compare with a reference version (given the same input, of course you'd have to print the input and reformat it to feed into pylab/matlab etc). However, the -60 and 1e-6 are fudge factors which you don't want, the same effect is better done in a different way. Calculate like this:
```
power_in_db[i] = 10 * log(out[i][0]*out[i][0] + out[i][1]*out[i][1])/log(10)
```
Print out the values of power_in_db[i] for the same i but for all x (a horizontal line). Are they approximately the same?

If everything so far is good, the remaining suspect is setting the pixel values. Be very explicit about clipping to range, scaling and rounding.

int pixel_value = (int)round( 255 * (power_in_db[i] - min_db) / (max_db - min_db) );
if (pixel_value < 0) { pixel_value = 0; }
if (pixel_value > 255) { pixel_value = 255; }

Here, again, print out the values in a horizontal line, and compare with the grayscale values in your pgm (by hand, using the colorpicker in photoshop or gimp or similar).

At this point, you will have validated everything from end to end, and likely found the bug.

Undaunted answered 31/1, 2014 at 14:39 Comment(0)

The code you produced, was almost correct. So, you didn't left me much to correct:

void Spectrogram::process(){
    int transform_size = 1024;
    int half = transform_size/2;
    int step_size = transform_size/2;
    double in[transform_size];
    double processed[half];
    fftw_complex *out;
    fftw_plan p;

    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);


    for (int x=0; x < wavFile->getSamples()/step_size; x++) {

        // Fill the transformation array with a sample frame and apply the window function.
        // Normalization is performed later
        // (One error was here: you didn't set the last value of the array in)
        for (int j = 0, int i = x * step_size; i < x * step_size + transform_size; i++, j++)
            in[j] = wavFile->getSample(i) * windowHanning(j, transform_size);

        p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);

        fftw_execute(p); /* repeat as needed */

        for (int i=0; i < half; i++) {
            // (Here were some flaws concerning the access of the complex values)
            out[i][0] *= (2./transform_size);                         // real values
            out[i][1] *= (2./transform_size);                         // complex values
            processed[i] = out[i][0]*out[i][0] + out[i][1]*out[i][1]; // power spectrum
            processed[i] = 10./log(10.) * log(processed[i] + 1e-6);   // dB

            // The resulting spectral values in 'processed' are in dB and related to a maximum
            // value of about 96dB. Normalization to a value range between 0 and 1 can be done
            // in several ways. I would suggest to set values below 0dB to 0dB and divide by 96dB:

            // Transform all dB values to a range between 0 and 1:
            if (processed[i] <= 0) {
                processed[i] = 0;
            } else {
                processed[i] /= 96.;             // Reduce the divisor if you prefer darker peaks
                if (processed[i] > 1)
                    processed[i] = 1;
            }

            In->setPixel(x,(half-1)-i,processed[i]*255);
        }

        // This should be called each time fftw_plan_dft_r2c_1d()
        // was called to avoid a memory leak:
        fftw_destroy_plan(p);
    }

    fftw_free(out);
}

The two corrected bugs were most probably responsible for the slight variation of successive transformation results. The Hanning window is very vell suited to minimize the "noise" so a different window would not have solved the problem (actually @Alex I already pointed to the 2nd bug in his point 2. But in his point 3. he added a -Inf-bug as log(0) is not defined which can happen if your wave file containts a stretch of exact 0-values. To avoid this the constant 1e-6 is good enough).

Not asked, but there are some optimizations:

put p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE); outside the main loop,
precalculate the window function outside the main loop,
abandon the array processed and just use a temporary variable to hold one spectral line at a time,
the two multiplications of out[i][0] and out[i][1] can be abandoned in favour of one multiplication with a constant in the following line. I left this (and other things) for you to improve
Thanks to @Maxime Coorevits additionally a memory leak could be avoided: "Each time you call fftw_plan_dft_rc2_1d() memory are allocated by FFTW3. In your code, you only call fftw_destroy_plan() outside the outer loop. But in fact, you need to call this each time you request a plan."

Murphree answered 1/2, 2014 at 12:14 Comment(2)

"added a -Inf-bug" - almost :) -inf would be INT_MIN when cast to int (although that appears to be implementation dependent rather than part of the C spec) and then become 0 in the output because it is clipped to 0-255 range. Could be more explicit, I guess. – Undaunted 1/2, 2014 at 13:31

I was doing a similar thing, but some window calculations were too bright in my output. Thanks for the + 1e-6 tip. Now spectrogram looks pretty fine. – Dripdry 14/12, 2018 at 18:52

Audacity typically doesn't map one frequency bin to one horizontal line, nor one sample period to one vertical line. The visual effect in Audacity may be due to resampling of the spectrogram picture in order to fit the drawing area.

Bouncy answered 27/1, 2014 at 21:11 Comment(1)

This was my thinking too -- quite likely to be visual artifacts. Try outputting at a higher graphical resolution and then zoom in – Austriahungary 27/1, 2014 at 21:14

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags