Cleaning up noisy Cepstrum results

Asked 12/3, 2011 at 18:50 Answered 12/2, 2013 at 22:33

Solved iphone signal-processing fft pitch frequency-analysis

I've been working on a simple frequency detection setup on the iphone. Analyzing in the frequency domain using FFT results has been somewhat unreliable in the presence of harmonics. I was hoping to use Cepstrum results to help decide what fundamental frequency is playing.

I am working with AudioQueues in the AudioToolbox framework, and do the Fourier transforms using the Accelerate framework.

My process has been exactly what is listed on Wikipedia's Cepstrum article for the Real Power Cepstrum, specifically: signal → FT → abs() → square → log → FT → abs() → square → power cepstrum.

The problem I have is that the Cepstrum results are extremely noisy. I have to drop the first and last 20 values as they are astronomical compared to the other values. Even after "cleaning" the data, there is still a huge amount of variation - far more than I would expect given the first graph. See the pictures below for the visualizations of the frequency domain and the quefrency domain. FFT FFT Cepstrum

When I see such a clear winner in the frequency domain as on that graph, I expect to see a similarly clear result in the quefrency domain. I played A440 and would expect bin 82 or so to have the highest magnitude. The third peak on the graph represents bin 79, which is close enough. As I said, the first 20 or so bins are so astronomical in magnitude as to be unusuable, and I had to delete them from the data set in order to see anything. Another odd quality of the cepstrum data is that the even bins seem to be much higher than the odd bins. Here are the frequency bins from 77-86:

77: 151150.0313
78:  22385.92773
79: 298753.1875
80:  56532.72656
81: 114177.4766
82:  31222.88281
83:   4620.785156
84:  13382.5332
85:     83.668259
86: 1205.023193

My question is how to clean up the frequency domain so that my Cepstrum domain results are not so wild. Alternately, help me better understand how to interpret these results if they are as one would expect in a Cepstrum analysis. I can post examples of the code I'm using, but it mostly uses vDSP calls and I don't know how helpful that would be.

Unprincipled answered 12/3, 2011 at 18:50 Comment(2)

You might want to try applying a window function prior to the first FFT. – Tomfool 13/3, 2011 at 20:59

I spent the better part of the morning trying to understand why so many people suggest that. It isn't obvious why a window function will improve the transforms. I did not understand spectral leakage until now, and I believe that's contributing to the messiness. Thanks for the tip! – Unprincipled 14/3, 2011 at 13:55

A cepstrum, or cepstral analysis, is a technique used to try to separate a signal with high overtone content into two portions. The portion near DC represents the spectral envelope of all the overtones, or the speech formant, which might be useful for speaker or instrument recognition. Later peaks in the cepstrum result represents the exciter frequency or frequencies, if that frequency generates enough harmonic overtone content.

Since a cepstrum is usually done without any (non-rectangular) window, it can produce a Sinc response even to a clean overtone sequence, with the width of the response inversely roughly proportional to the length of the overtone sequence or the number of overtones. And, of course, any slightly inharmonic overtones (as found in actual musical instruments) will make the cepstrum results even messier. So a cepstrum peak may only be good at giving one the approximate location of the fundamental frequency, which could still be a useful result in rejecting other frequency candidates when doing frequency estimation.

A "clean looking" cepstrum might be the result of a very long sequence of exactly harmonic overtones with a nearly flat frequency response, which is perhaps not what is found in real life signals.

Foray answered 12/3, 2011 at 19:33 Comment(7)

So it is unrealistic to try to clean up the Cepstrum results. Is it common to throw away the first and last several bins? Are there any other clarifying techniques I could use? – Unprincipled 12/3, 2011 at 20:6

@fast4ear : The bins near DC contain information about the formant. If you don't care about the shape of the formant, you may not need the information in those bins. – Foray 12/3, 2011 at 20:12

So if I'm sampling at 44100Hz and I have a 4096 bin sample, and I'm interested in 440Hz, I would look in bin 82 ((22050/4096)*82) in the frequency domain. Should I also look in bin 82 in the quefrency domain? Or would I look in bin 50 (22050/440)? – Unprincipled 12/3, 2011 at 23:54

@Unprincipled Were you able to get the fundamental frequency estimate using cepstral analysis? For 440Hz, I guess you should look at bin 44100/440. I'm not exactly sure. – Tumble 30/11, 2012 at 7:59

@Tumble the information is there but it proved too noisy to be of use consistently. I can look at a graph and see it but the algorithm to detect via cepstrum what, if any, frequencies are present still eludes me. I would suggest a different direction. – Unprincipled 3/12, 2012 at 19:31

@Unprincipled Thank you for clarifying. I tried Cepstral analysis in almost the same lines as you and in iOS itself. I too did not have much luck with it. Windowing did not seem to help me either. May be we missed something. Anyway, I started using HPS and it works fine for me. – Tumble 4/12, 2012 at 4:17

@Foray Hi! can you please help me in calculating estimated cepstrum of time series? – Taciturn 20/12, 2021 at 20:23

The following analysis illustrates Cepstrum's performance on synthetic and real-world signals.

First we examine a synthetic signal.

The plot below shows a synthetic steady-state E2 note, synthesized using a typical near-DC component, a fundamental at 82.4 Hz, and a total of 8 harmonics at integer multiples of 82.4 Hz. The synthetic sinusoid was programmed to generate 4096 samples.

Synthetic E2 note spectrum

The plot below shows a closeup of the input that was used for the Cepstrum calculation of the synthetic E2 note. It is the log(|FFT|^2) output from the synthetic E2 note.

Cepstrum input: synthetic E2 note's spectrum

The plot below shows the Cepstrum of the synthetic E2 note. Observe the prominent non-DC peak at 12.36. The Cepstrum width is 1024 (the output of the second FFT), therefore the peak corresponds to 1024/12.36 = 82.8 Hz which is very close to the actual 82.4 Hz of the fundamental.

Synthetic E2 note cepstrum closeup

Now we examine a real-world signal.

The plot below shows the spectrum of the E2 note from a real acoustic guitar.

Guitar E2 note spectrum closeup

The plot below shows a closeup of the input that was used for the Cepstrum calculation of the acoustic guitar's E2 note. It is the log(|FFT|^2) output from the acoustic guitar's E2 note.

enter image description here

The plot below shows the Cepstrum of the acoustic guitar's E2 note. Observe the prominent non-DC peak at 542.8. The Cepstrum width is 32768 (the output of the second FFT), therefore the peak corresponds to 32768/542.8 = 60.4 Hz which is fairly far from the actual 82.4 Hz of the fundamental.

Guitar E2 note cepstrum closeup

The recording of the E2 guitar note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, and no other instruments or voices.

This illustrates the significant challenge of using Cepstral analysis for pitch determination in real-world audio signals.

References:

Real audio signal data, synthetic signal generation, plots, FFT, and Cepstral analysis were done here: Musical instrument cepstrum

Alicealicea answered 12/2, 2013 at 22:33 Comment(1)

Your graphs sum it pretty well. I abandoned cepstrum because it's intractably noisy to make an algorithmic decision. – Unprincipled 12/2, 2013 at 22:37

If I understand well, the primary problem is to detect a frequency from an audio signal.

For sure you mean the strongest frequency in the spectrum so I suggest to use this excellent library http://www.schmittmachine.com/dywapitchtrack.html

"The heart of the algorithm is a very powerful wavelet algorithm, described in a paper by Eric Larson and Ross Maddox : "Real-Time Time-Domain Pitch Tracking Using Wavelets" of UIUC Physics."

Hope this help

Nonchalance answered 12/2, 2013 at 11:30 Comment(2)

Thanks for adding this answer! Another tool in the toolbox always helps, especially for such a hard problem. I'll check this out tonight and respond with results. – Unprincipled 12/2, 2013 at 16:7

It's a pretty impressive algorithm. I tested using a digital piano using Steinway samples and it was reasonably accurate between C2 and C5. Neat find, thanks again for sharing. – Unprincipled 13/2, 2013 at 1:1

Recommended topics

Hot tags