Algorithm to determine fundamental frequency from potential harmonics
Asked Answered
B

3

17

I am attempting to extract a fundamental frequency from a sound source. maybe someone is singing A3 into the microphone, so I want to be detecting ~ 110Hz

my approach is:

  • FFT 1024 floats
  • use the phase of each bin to accurately determine its precise frequency
  • determine peaks (typically 50 or so)
  • order them with the loudest first

(Peak[0].power=1063.343750, .freq=2032.715088
(Peak[1].power=1047.764893, .freq=3070.605225
(Peak[2].power=1014.986877, .freq=5925.878418
(Peak[3].power=1011.707825, .freq=6963.769043
(Peak[4].power=1009.152954, .freq=4022.363037
(Peak[5].power=995.199585, .freq=4974.120605
(Peak[6].power=987.243713, .freq=8087.792480
(Peak[7].power=533.514832, .freq=908.691833

  • (MARKER1) start with the loudest, and match it against all remaining peaks, so if I had N peaks, I will have at this point N-1 peak-pairs
  • examine each peak-pair for harmonicity; ie how close is it to some fraction a/b, ie can we find a/b with b<20 such that |peakA.freq/peakB.freq - a/b| < 0.01 (this would match harmonics up to the 20th one)
  • we now have a refined list of peaks that are considered harmonic with one another

    Harmonic PeakPair: (0,1)=2/3, error:0.00468 => f0 @ 1019.946289
    Harmonic PeakPair: (0,2)=1/3, error:0.00969 => f0 @ 2004.003906
    Harmonic PeakPair: (0,3)=2/7, error:0.00618 => f0 @ 1005.590820
    Harmonic PeakPair: (0,4)=1/2, error:0.00535 => f0 @ 2021.948242
    Harmonic PeakPair: (0,5)=2/5, error:0.00866 => f0 @ 1005.590820
    Harmonic PeakPair: (0,6)=1/4, error:0.00133 => f0 @ 2027.331543
    Harmonic PeakPair: (0,7)=9/4, error:0.01303 => f0 @ 226.515106

My question is: how can I devise an algorithm that will correctly identify the above fundamental as ~1000Hz?

It is by no means guaranteed that there will be a higher concentration of values at ~1000 than at ~2000 or ~3000 etc. it isn't even guaranteed that there will be any entry ~1000. we could have ~5000 x one entry, ~4000 x three entries, ~3000 x 2 entries, and a couple of bogus values floating around, like the 226 in the above list.

I guess I can repeat the procedure again, weeding out suggested fundamentals which are not 'harmonic' with the rest of the list. this would at least get rid of the bogus values...

it may be that I'm not even asking the right question. Maybe this whole approach sucks. But I think it makes sense to pick the strongest peak and extract a set of harmonics associated with that peak.

in theory that should generate a load of ratios, say if how original strongest peak was the third harmonic, then this set of peaks should contain 3/1 3/2 3/3 3/4 3/5 3/6 3/7 etc ... although some may be missing.

realistically I have a feeling it's always going to be either a fundamental or the first harmonic that has the greatest strength. but I don't know if I can rely on this...

so many factors, it is making my head swim. I apologise in advance for such a messy question. Hopefully I can tidy it up posthumously.

Bender answered 17/1, 2011 at 18:42 Comment(5)
for some reason it is much harder to detect guitar tones than sung tones. Although maybe this is because I am trapping the first few frames of the note; 32 x 256 bytes = 8k, so that is about the first 1/5 sec. so I guess the note needs some time to stabilise...Bender
I have put in a marker because at some point I would want to pull out a complete tone, and loop through the process again for each tone until nothing is detected. This way I could catch the complete polyphony.Bender
You probably want to consider using autocorrelation or cepstral analysis.Pilaf
This question covers the same ground: https://mcmap.net/q/445710/-cepstral-analysis-for-pitch-detection/10468 But with "cepstral" in the title, someone not familiar with that term might not recognize the question as relevant. It does say "pitch detection", though.Copalm
You say "I have a feeling it's always going to be either a fundamental or the first harmonic that has the greatest strength". I'd rethink this assumption. I know that the fundamental is often missing in speech, and I've heard (pardon the pun) that the 2nd harmonic (i.e., twice the fundamental) is often missing in human singing. Do you have a bunch of representative spectra that you've looked at to make you think otherwise? hotpaw2's suggestion of cepstral analysis seems like the best way to get to within a bin or two of the fundamental, then you can try and refine from there.Televisor
B
1

I have rephrased the question, and provided an answer here: How to take in a set of numbers like {301,102,99,202,198,103} and throw out ~100?

I had looked at several approaches, and this is considerably more succinct than anything else I've found. I have tested it and it works very well.

Bender answered 1/2, 2011 at 5:14 Comment(0)
N
9

A Cepstum (or Cepstral analysis) and Harmonic Product Spectrum are two well studied algorithms that estimate the exciter frequency from an overtone series.

If the sequences of overtones are appropriately spaced, than a Cepstrum (FFT of the log of the FFT peaks) may be useful in estimating the period of the frequency spacing, which can then be used to estimate the frequency.

The Harmonic Product Spectrum basically compares the spectral peaks with nth multiple copies of themselves by decimating the spectrum by multiple low integer ratios and overlapping them.

Nupercaine answered 19/1, 2011 at 4:27 Comment(0)
H
2

You can go through following link for an article on speech recognition.

Article: Phase Space Point Disribution Parameter for Speech Recognition (subscription required for full text)

Hinrichs answered 21/11, 2011 at 13:40 Comment(0)
B
1

I have rephrased the question, and provided an answer here: How to take in a set of numbers like {301,102,99,202,198,103} and throw out ~100?

I had looked at several approaches, and this is considerably more succinct than anything else I've found. I have tested it and it works very well.

Bender answered 1/2, 2011 at 5:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.