How to extract semi-precise frequencies from a WAV file using Fourier Transforms

Asked 21/5, 2010 at 11:25 Answered 27/5, 2010 at 12:15

Let us say that I have a WAV file. In this file, is a series of sine tones at precise 1 second intervals. I want to use the FFTW library to extract these tones in sequence. Is this particularly hard to do? How would I go about this?

Also, what is the best way to write tones of this kind into a WAV file? I assume I would only need a simple audio library for the output.

My language of choice is C

Partly answered 21/5, 2010 at 11:25 Comment(0)

To get the power spectrum of a section of your file:

collect N samples, where N is a power of 2 - if your sample rate is 44.1 kHz for example and you want to sample approx every second then go for say N = 32768 samples.
apply a suitable window function to the samples, e.g. Hanning
pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts
calculate the squared magnitude of your FFT output bins (re * re + im * im)
(optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB

Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.

Sociometry answered 21/5, 2010 at 21:51 Comment(2)

Note that the Hanning window function will smear the input over several bins; the 1.35 Hz suggested is quite optimistic. As Wikipedia notes, it may in fact make sense not to window at all. – Joly 21/8, 2013 at 7:45

Hann or Hamming windows tend to be the most useful general purpose window functions. They both give a reasonable compromise in that the magnitude and frequency of a peak will be fairly reliable (unlike the no window case) and the peak will also be reasonably sharp. If you're looking to identify separate peaks that are very close together though then there are probably better choices for window function. Using no window at all (i.e. rectangular window function) usually only makes sense if you are looking at components which align exactly with bin frequencies. – Sociometry 21/8, 2013 at 7:51

You are basically interested in estimating a Spectrum -assuming you've already gone past the stage of reading the WAV and converting it into a discrete time signal.

Among the various methods, the most basic is the Periodogram, which amounts to taking a windowed Discrete Fourier Transform (with a FFT) and keeping its squared magnitude. This correspond to Paul's answer. You need a window which spans over several periods of the lowest frequency you want to detect. Example: if your sinusoids can be as low as 10 Hz (period = 100ms), you should take a window of 200ms o 300ms or so (or more). However, the periodogram has some disadvantages, though it's simple to compute and it's more than enough if high precision is not required:

The raw periodogram is not a good spectral estimate because of spectral bias and the fact that the variance at a given frequency does not decrease as the number of samples used in the computation increases.

The periodogram can perform better by averaging several windows, with a judious choosing of the widths (Bartlet method). And there are many other methods for estimating the spectrum (AR modelling).

Actually, you are not exactly interested in estimating a full spectrum, but only the location of a single frequency. This can be done seeking a peak of an estimated spectrum (done as explained), but also by more specific and powerful (and complicated) methods (Pisarenko, MUSIC algorithm). They would probably be overkill in your case.

Ria answered 27/5, 2010 at 12:15 Comment(0)

WAV files contain linear pulse code modulated (LPCM) data. That just means that it is a sequence of amplitude values at a fixed sample rate. A RIFF header is contained at the beginning of the file to convey information like sampling rate and bits per sample (e.g. 8 kHz signed 16-bit).

The format is very simple and you could easily roll your own. However, there are several libraries available to speed the process such as libsndfile. Simple Direct-media Layer (SDL)/SDL_mixer and PortAudio are two nice libraries for playback.

As for feeding the data into FFTW, you would need to buffer 1 second chunks (determine size by the sample rate and bits per sample). Then convert all of the samples to IEEE floating-point (i.e. float or double depending on the FFTW configuration--libsndfile can do this for you). Next create another array to hold the frequency domain output. Finally, create and execute an FFTW plan by passing both buffers to fftw_plan_dft_r2c_1d and calling fftw_execute with the returned fftw_plan handle.

Autocephalous answered 21/5, 2010 at 14:20 Comment(4)

Not actually the fftw version, but whether or not it was compiled with float support, no? – Canonry 21/5, 2010 at 14:31

True, it is a matter of the build configuration IIRC. I haven't used FFTW in many years. Perhaps "version" is not the most accurate word I could have chose? – Autocephalous 21/5, 2010 at 16:15

Much of the audio DSP software for Linux (and other platforms) which uses FFTW requires FFTW built with float support, and having spent much time building this stuff from source, I can say that Debian at least, has packages for the various different build options of FFTW which can all be installed simultaneously. I expect this goes for most other Linux distros too. – Darren 21/5, 2010 at 22:25

libsndfile will take care of converting your WAV files to floating point format, automatically, in general it's really quite a breeze to use. – Darren 21/5, 2010 at 22:27

Recommended topics

Hot tags