Further understanding of fftw processing of portaudio signals

Asked 29/2, 2016 at 13:55 Answered 29/2, 2016 at 15:30

I want to analyze a signal I get from my microphone port by using portaudio and fftwpp. For that I followed the explanation provided here. My questions concerning that are now:
There it is stated that I should chunk a window out of the incoming data. My data is already chunked, after I am only recording for a short time, and afterwards process it. Thus I am assuming that a rectangular window is already applied to my data. Is that correct?
Now I am getting 200k data points, should I directly put them into an array:

    Array::array1<Complex> F(np,align);
    Array::array1<double> f(n,align);               // For out-of-place transforms
    //  array1<double> f(2*np,(double *) F()); // For in-place transforms

    fftwpp::rcfft1d Forward(n,f,F);
    fftwpp::crfft1d Backward(n,F,f);
    qDebug() << "Putting " << numSamples << " into an array!";
    for(int i = 0; i < numSamples; i++)
        f[i] = this->data.recordedSamples[i];

or should I split them up? If I all put them in one array, which resolution do I get then? My sample rate is set to 44.1 kHz.

Aguascalientes answered 29/2, 2016 at 13:55 Comment(0)

Assuming your data is not stationary (in other words the spectral content is time-varying, as would be the case for e.g. speech or music), then you would typically want to pick a window size during which the data can be considered to be somewhat stationary. For speech and music a typical window size might be of the order of 20 ms. For a sample rate of 44.1 kHz this correspond to 882 samples, so an FFT size of 1024 might be a good starting point.

It's also common to overlap successive windows, to get better time resolution for the time-varying components of your signal. A 50% overlap is commonly used, so your first block of samples would be 0..1023, the second block would be 512..1535, etc.

As has already been suggested in @Stefan's answer, you should apply a suitable window function to each block of samples, prior to the FFT. Commonly used windows are Hamming and von Hann (aka Hanning). Obviously the window function needs to be the same size as the FFT (e.g. N = 1024).

For any remaining block of samples of size < N at the end of your data you can just pad with zeroes.

The commonly used term for the above operation is generating a spectrogram. It's essentially a 3D data structure of time v frequency v magnitude/phase, which can bd displayed in various different ways or used for further frequency-domain processing.

See also these closely related StackOverflow questions and answers:

Fraley answered 29/2, 2016 at 15:30 Comment(2)

If my window is smaller than 20 ms, I am cutting everything below a certain frequency. Is that correct? – Aguascalientes 1/3, 2016 at 15:55

Um, no - you're not "cutting" anything - the main thing that window size determines is frequency resolution. A longer window allows you to have higher frequency resolution at the expense of poorer time resolution - it's always a trade-off. – Fraley 1/3, 2016 at 16:8

Thus I am assuming that a rectangular window is already applied to my data. Is that correct?

In a way, a window is commonly used to filter out high frequency distortion due to the sudden on/off state of the signal, or reduce or reorder spectral leakage (https://en.wikipedia.org/wiki/Spectral_leakage)

It is recommended to apply a window, especially (non-rectangular) if you want to visualize the fft. See https://en.wikipedia.org/wiki/Window_function#Hann_.28Hanning.29_window for options.

Be aware that you apply the window before the fft.

or should I split them up?

Well, that depends on your requirements. But in general, its better not to, due to the windowing, the longer the sample, the more accurate the FFT for that period of time, although those kind of techniques are not uncommon to speed things up.

which resolution do I get then?

The resolution is the sample rate divided by the sample count.

Playbook answered 29/2, 2016 at 14:18 Comment(6)

Should I apply the window in each case, even if I don't want to visualize the data? – Aguascalientes 29/2, 2016 at 14:20

Well, that depends on the application. The main reason to apply a window is to reduce (or reposition) spectral leakage (see en.wikipedia.org/wiki/Spectral_leakage). It totally depends on the application if this is necessary. A said, if you want to visualize the spectrum it's not wanted. If you need to test the existence of 1 frequency component, a rectangular window will do. Applying no window at all, gives you practically the same results (provided you ensure the 2^n-fft rule). – Playbook 29/2, 2016 at 14:45

An alternative approach is mentioned here: #4082858 – Playbook 29/2, 2016 at 14:45

2^n-fft rule means that my sample input size should be equivalent to 2^n? – Aguascalientes 29/2, 2016 at 14:50

@arc_lupus: yes, that's the one I mean. Btw, applying a window is not much cpu effort compared to the fft. I would go for the windowing, maybe create an option to turn it on or off, so you can see the different outcomes of your algotihm ;-) – Playbook 29/2, 2016 at 14:53

Thus, if my sample data has more points than 2^n, but less than 2^(n+1), I should cut the last samples away, or fill the buffer with zeros up to 2^(n+1)? – Aguascalientes 29/2, 2016 at 15:8

Recommended topics

Hot tags