C/C++/Obj-C Real-time algorithm to ascertain Note (not Pitch) from Vocal Input

E

9

3

I want to detect not the pitch, but the pitch class of a sung note.

So, whether it is C4 or C5 is not important: they must both be detected as C.

Imagine the 12 semitones arranged on a clock face, with the needle pointing to the pitch class. That's what I'm after! ideally I would like to be able to tell whether the sung note is spot-on or slightly off.

This is not a duplicate of previously asked questions, as it introduces the constraints that:

the sound source is a single human voice, hopefully with negligible background interference (although I may need to deal with this)
the octave is not important, only the pitch class

EDIT -- Links:
Real time pitch detection
Using the Apple FFT and Accelerate Framework

Earley answered 31/10, 2010 at 6:36 Comment(2)

Yeah, you're reinventing the wheel. Why doesn't this exist for PC? It looks much better than Sing & See. – Mim 5/1, 2011 at 13:9

Off-topic: Hmm, just found something new which might perform better, uh well... – Mim 5/1, 2011 at 13:25

T

5

Most of the frequency detection algorithms cited in other answers don't work well for voice. To see why this is so intuitively, consider that all the vowels in a language can be sung at one particular note. Even though all those vowels have very different frequency content, they would all have to be detected as the same note. Any note detection algorithm for voices must take this into account somehow. Furthermore, human speech and song contains many fricatives, many of which have no implicit pitch in them.

In the generic (non voice case) the feature you are looking for is called the chroma feature and there is a fairly large body of work on the subject. It is equivalently known as the harmonic pitch class profile. The original reference paper on the concept is Tayuka Fujishima's "Real-Time Chord Recognition of Musical Sound: A System Using Common Lisp Music". The Wikipedia entry has an overview of a more modern variant of the algorithm. There are a bunch of free papers and MATLAB implementations of chroma feature detection.

However, since you are focusing on the human voice only, and since the human voice naturally contains tons of overtones, what you are practically looking for in this specific scenario is a fundamental frequency detection algorithm, or f0 detection algorithm. There are several such algorithms explicitly tuned for voice. Also, here is a widely cited algorithm that works on multiple voices at once. You'd then check the detected frequency against the equal-tempered scale and then find the closest match.

Since I suspect that you're trying to build a pitch detector and/or corrector a la Autotune, you may want to use M. Morise's excellent WORLD implementation, which permits fast and good quality detection and modification of f0 on voice streams.

Lastly, be aware that there are only a few vocal pitch detectors that work well within the vocal fry register. Almost all of them, including WORLD, fail on vocal fry as well as very low voices. A number of papers refer to vocal fry as "creaky voice" and have developed specific algorithms to help with that type of voice input specifically.

Tie answered 18/12, 2015 at 5:0 Comment(0)

O

7

See my answer here for getting smooth FREQUENCY detection: https://mcmap.net/q/272191/-using-the-apple-fft-and-accelerate-framework

As far as snapping this frequency to the nearest note -- here is a method I created for my tuner app:

- (int) snapFreqToMIDI: (float) frequencyy {

    int midiNote = (12*(log10(frequencyy/referenceA)/log10(2)) + 57) + 0.5;
    return midiNote;
}

This will return the MIDI note value (http://www.phys.unsw.edu.au/jw/notes.html)

In order to get a string from this MIDI note value:

- (NSString*) midiToString: (int) midiNote {
    NSArray *noteStrings = [[NSArray alloc] initWithObjects:@"C", @"C#", @"D", @"D#", @"E", @"F", @"F#", @"G", @"G#", @"A", @"A#", @"B", nil];
    return [noteStrings objectAtIndex:midiNote%12];
}

For an example implementation of the pitch detection with output smoothing, look at musicianskit.com/developer.php

Oration answered 14/6, 2012 at 23:12 Comment(3)

What is referenceA in int midiNote = (12*(log10(frequency/referenceA)/log10(2)) + 57) + 0.5; ? – Lyophilic 26/4, 2013 at 13:32

440.0 for most orchestral purposes. Check out wikipedia's entry on pitch standards - en.wikipedia.org/wiki/A440_(pitch_standard) – Oration 23/5, 2013 at 21:17

Thanks, I am looking at your AutoCorrelation example project, how does it handle noise from the mic detection? – Lyophilic 31/5, 2013 at 11:31

K

6

Pitch is a human psycho-perceptual phenomena. Peak frequency content is not the same as either pitch or pitch class. FFT and DFT methods will not directly provide pitch, only frequency. Neither will zero crossing measurements work well for human voice sources. Try AMDF, ASDF, autocorrelation or cepstral methods. There are also plenty of academic papers on the subject of pitch estimation.

There is another long list of pitch estimation algorithms here.

Edited addition: Apple's SpeakHere and aurioTouch sample apps (available from their iOS dev center) contain example source code for getting PCM sample blocks from the iPhone's mic.

Kealey answered 31/10, 2010 at 16:11 Comment(0)

T

5

Most of the frequency detection algorithms cited in other answers don't work well for voice. To see why this is so intuitively, consider that all the vowels in a language can be sung at one particular note. Even though all those vowels have very different frequency content, they would all have to be detected as the same note. Any note detection algorithm for voices must take this into account somehow. Furthermore, human speech and song contains many fricatives, many of which have no implicit pitch in them.

In the generic (non voice case) the feature you are looking for is called the chroma feature and there is a fairly large body of work on the subject. It is equivalently known as the harmonic pitch class profile. The original reference paper on the concept is Tayuka Fujishima's "Real-Time Chord Recognition of Musical Sound: A System Using Common Lisp Music". The Wikipedia entry has an overview of a more modern variant of the algorithm. There are a bunch of free papers and MATLAB implementations of chroma feature detection.

However, since you are focusing on the human voice only, and since the human voice naturally contains tons of overtones, what you are practically looking for in this specific scenario is a fundamental frequency detection algorithm, or f0 detection algorithm. There are several such algorithms explicitly tuned for voice. Also, here is a widely cited algorithm that works on multiple voices at once. You'd then check the detected frequency against the equal-tempered scale and then find the closest match.

Since I suspect that you're trying to build a pitch detector and/or corrector a la Autotune, you may want to use M. Morise's excellent WORLD implementation, which permits fast and good quality detection and modification of f0 on voice streams.

Lastly, be aware that there are only a few vocal pitch detectors that work well within the vocal fry register. Almost all of them, including WORLD, fail on vocal fry as well as very low voices. A number of papers refer to vocal fry as "creaky voice" and have developed specific algorithms to help with that type of voice input specifically.

Tie answered 18/12, 2015 at 5:0 Comment(0)

H

3

If you are looking for the pitch class you should have a look at the chromagram (http://labrosa.ee.columbia.edu/matlab/chroma-ansyn/)

You can also simply dectect the f0 (using something like YIN algorithm) and return the appropriate semitone, most of fundamental frequency estimation algorithms suffer from octave error

Humpback answered 10/8, 2012 at 10:23 Comment(2)

The Chromagram from LabRosa is somewhat great work from Dan Ellis from Columbia University. The code is for Matlab. – Brassy 10/9, 2013 at 22:31

I don't think he did invert the Chromagram but indeed he gives a free implementation in matlab on hist web page – Humpback 26/9, 2013 at 12:47

S

2

Perform a Discrete Fourier Transform on samples from your input waveform, then sum values that correspond to equivalent notes in different octaves. Take the largest value as the dominant frequency.

You can likely find some existing DFT code in Objective C that suits your needs.

Stoneware answered 31/10, 2010 at 9:9 Comment(3)

This doesn't work for human voices, especially male voices, as many frequency values more likely belong to the overtone of a completely different "note", than it does to the obvious choice from a frequency-to-note table. – Kealey 31/10, 2010 at 18:4

The DFT is a building block that is used in many pitch detection algorithms, but it is no substitute for a real pitch detection algorithm. – Ardeb 1/11, 2010 at 22:29

hotpaw2, Could you elaborate on that comment? I have looked at a spectrograph of a female voice, and it seems to consist of a fundamental frequency together with its harmonics. now I am confused... – Earley 4/11, 2010 at 13:19

E

2

Putting up information as I find it...

Pitch detection algorithm on Wikipedia is a good place to start. It lists a few methods that fail for determining octave, which is okay for my purpose.

A good explanation of autocorrelation can be found here (why can't Wikipedia put things simply like that??).

Earley answered 31/10, 2010 at 14:39 Comment(0)

E

2

Finally I have closure on this one, thanks to this article from DSP Dimension

The article contains source code.

Basically he performs an FFT. then he explains that frequencies that don't coincide spot on with the centre of the bin they fall in will smear over nearby bins in a sort of bell shaped curve. and he explains how to extract the exact frequency from this data in a second pass (FFT being the first pass).

the article then goes further to pitch shift; I can simply delete the code.

note that they supply a commercial library that does the same thing (and far more) only super optimised. there is a free version of the library that would probably do everything I need, although since I have worked through the iOS audio subsystem, I might as well just implement it myself.

for the record, I found an alternative way to extract the exact frequency by approximating a quadratic curve over the bin and its two neighbours here. I have no idea what is the relative accuracy between these two approaches.

Earley answered 19/11, 2010 at 1:3 Comment(1)

That's a vocoder. A vocoder is a classic method for pitch scaling. What does that have to do with the original question? – Tie 19/12, 2015 at 9:24

A

1

As others have mentioned you should use a pitch detection algorithm. Since that ground is well-covered I will address a few particulars of your question. You said that you are looking for the pitch class of the note. However, the way to find this is to calculate the frequency of the note and then use a table to convert it to the pitch class, octave, and cents. I don't know of any way to obtain the pitch class without finding the fundamental frequency.

You will need a real-time pitch detection algorithm. In evaluating algorithms pay attention to the latency implied by each algorithm, compared with the accuracy you desire. Although some algorithms are better than others, fundamentally you must trade one for the other and cannot know both with certainty -- sort of like the Heisenberg uncertainty principle. (How can you know the note is C4 when only a fraction of a cycle has been heard?)

Your "smoothing" approach is equivalent to a digital filter, which will alter the frequency characteristics of the voice. In short, it may interfere with your attempts to estimate the pitch. If you have an interest in digital audio, digital filters are fundamental and useful tools in that field, and a fascinating subject besides. It helps to have a strong math background in understanding them, but you don't necessarily need that to get the basic idea.

Also, your zero crossing method is a basic technique to estimate the period of a waveform and thus the pitch. It can be done this way, but only with a lot of heuristics and fine-tuning. (Essentially, develop a number of "candidate" pitches and try to infer the dominant one. A lot of special cases will emerge that will confuse this. A quick one is the less 's'.) You'll find it much easier to begin with a frequency domain pitch detection algorithm.

Ardeb answered 1/11, 2010 at 22:44 Comment(1)

ZCR really doesn't help you estimate the period of a waveform at all. It's kind of a poor-man's entropy detector if you are too lazy to detect entropy using a better algorithm. – Tie 18/12, 2015 at 20:22

F

1

if you re beginner this may be very helpful. It is available both on Java and IOS.

dywapitchtrack for ios

dywapitchtrack for java

Firelock answered 10/3, 2016 at 22:57 Comment(0)

Recommended topics

Hot tags