Real time pitch detection
Asked Answered
T

7

29

For real time pitch detection of a user's singing FFT and autocorrelation don't get a good result. I can't find C / C++ methods.

Microphone input data is correct and when using a sine wave results are more or less the correct pitch. I'm visualizing autocorrelation by taking the values out of results array and each index, plotting index on the X axis and the value on Y axis (both are divided by 100,000, I'm using OpenGL, using VST plugins isn't an option). It looks like random dots. How to visualize the raw audio and autocorrelation data?

Transact answered 30/8, 2009 at 15:17 Comment(11)
I suspect that you've been "doing it wrong". Did you ever solve the underling problem from those other questions? The whole "random results" thing just sounds like you haven't got those methods working right, yet.Jacalynjacamar
@dmckee i fed the autocorrelation one a sine wave and it returned more or less the correct pitch, but when i fed it the mic input, the results were all over the placeTransact
Perhaps your sampling is too narrow? Try taking the average of the values a few behind and in front of you, and using that to dislpay pitch.Fallow
For debugging, try whistling. The sound of whistling contains one very strong frequency with few overtones. You should also visualise the output of the FFT, if you weren't doing so already.Kiley
Sorry if i sound stupid, but to visualize the FFT / Autocorrelation, would i take each value in the result array, and plot that and the magnitude of that value?Transact
So, to visualize the output of the FFT / Autocorrelation, i would run through the array, plot each value and the magnitude of each value?Transact
Niall: yes, you need to plot the magnitude of the frequencies from your FFT.Deserved
... or in case of autocorrelation, the correlation coefficient for each possible period.Selfassured
I have asked a similar question here: #4062599 EDIT: Performous contains a C++ module for realtime pitch detection Also Yin Pitch-Tracking algorithmPrecipitin
Here is a some working code with explanation in c: blog.bjornroche.com/2012/07/… you can do better than this, but its a good start..Brandy
You could do real time pitch detection, be it of a singer's voice, with TarsosDSP github.com/JorenSix/TarsosDSP just in case anyone hasn't heard of it yet :-)Peruzzi
O
36

Taking a step back... To get this working you MUST figure out a way to plot intermediate steps of this process. What you're trying to do is not particularly hard, but it is error prone and fiddly. Clipping, windowing, bad wiring, aliasing, DC offsets, reading the wrong channels, the weird FFT frequency axis, impedance mismatches, frame size errors... who knows. But if you can plot the raw data, and then plot the FFT, all will become clear.

Oospore answered 30/8, 2009 at 19:16 Comment(3)
How exactly does one plot the raw data and FFT?Jocundity
@Helium3: Waveform and spectrogram (2D). Check Audacity.Pereyra
Or, output the intermediate representations to CSV and view them in MATLAB or Octave.Elohist
G
22

I found several open source implementations of real-time pitch tracking

  • dywapitchtrack uses a wavelet-based algorithm

  • "Realtime C# Pitch Tracker" uses a modified autocorrelation approach now removed from Codeplex - try searching on GitHub

  • aubio (mentioned by piem; several algorithms are available)

There are also some pitch trackers out there which might not be designed for real-time, but may be usable that way for all I know, and could also be useful as a reference to compare your real-time tracker to:

Grassofparnassus answered 9/9, 2011 at 3:10 Comment(0)
C
14

I know this answer isn't going to make everyone happy but here goes.

This stuff is hard, very hard. Firstly go read as many tutorials as you can find on FFT, Autocorrelation, Wavelets. Although I'm still struggling with DSP I did get some insights from the following.

https://www.coursera.org/course/audio the course isn't running at the moment but the videos are still available.

http://miracle.otago.ac.nz/tartini/papers/Philip_McLeod_PhD.pdf thesis about the development of a pitch recognition algorithm.

http://dsp.stackexchange.com a whole site dedicated to digital signal processing.

If like me you didn't do enough maths to completely follow the tutorials don't give up as some of the diagrams and examples still helped me to understand what was going on.

Next is test data and testing. Write yourself a library that generates test files to use in checking your algorithm/s.

1) A super simple pure sine wave generator. So say you are looking at writing YAT(Yet Another Tuner) then use your sine generator to create a series of files around 440Hz say from 420-460Hz in varying increments and see how sensitive and accurate your code is. Can it resolve to within 5Hz, 1Hz, finer still?

2) Then upgrade your sine wave generator so that it adds a series of weaker harmonics to the signal.

3) Next are real world variations on harmonics. So whilst for most stringed instruments you'll see a series of harmonics as simple multiples of the fundamental frequency F0, for instruments like clarinets and flutes because of the way the air behaves in the chamber the even harmonics will be missing or very weak. And for some instruments F0 is missing but can be determined from the distribution of the other harmonics. F0 being what the human ear perceives as pitch.

4) Throw in some deliberate distortion by shifting the harmonic peak frequencies up and down in an irregular manner

The point being that if you are creating files with known results then its easier to verify that what you are building actually works, bugs aside of course.

There are also a number of "libraries" out there containing sound samples. https://freesound.org from the Coursera series mentioned above. http://theremin.music.uiowa.edu/MIS.html

Next be aware that your microphone is not perfect and unless you have spent thousands of dollars on it will have a fairly variable frequency response range. In particular if you are working with low notes then cheaper microphones, read the inbuilt ones in your PC or Phone, have significant rolloff starting at around 80-100Hz. For reasonably good external ones you might get down to 30-40Hz. Go find the data on your microphone.

You can also check what happens by playing the tone through speakers and then recording with you favourite microphone. But of course now we are talking about 2 sets of frequency response curves.

When it comes to performance there are a number of freely available libraries out there although do be aware of the various licensing models.

Above all don't give up after your first couple of tries. Best of luck.

Coppinger answered 24/1, 2015 at 0:46 Comment(0)
H
9

Here's the C++ source code for an unusual two-stage algorithm that I devised which can do Realtime Pitch Detection on polyphonic MP3 files while being played on Windows. This free application (PitchScope Player, available on web) is frequently used to detect the notes of a guitar or saxophone solo upon a MP3 recording. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a MP3 music file. Note onsets are accurately inferred by a significant change in the most dominant pitch (a musical note) at any given moment during the MP3 recording.

When a single key is pressed upon a piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ). Linked at the bottom, is a snapshot of the actual harmonics which occur during a polyphonic MP3 recording of a guitar solo.

Instead of a FFT, I use a modified DFT transform, with logarithmic frequency spacing, to first detect these possible harmonics by looking for frequencies with peak levels (see diagram below). Because of the way that I gather data for my modified Log DFT, I do NOT have to apply a Windowing Function to the signal, nor do add and overlap. And I have created the DFT so its frequency channels are logarithmically located in order to directly align with the frequencies where harmonics are created by the notes on a guitar, saxophone, etc.

Now being retired, I have decided to release the source code for my pitch detection engine within a free demonstration app called PitchScope Player. PitchScope Player is available on the web, and you could download the executable for Windows to see my algorithm at work on a mp3 file of your choosing. The below link to GitHub.com will lead you to my full source code where you can view how I detect the harmonics with a custom Logarithmic DFT transform, and then look for partials (harmonics) whose frequencies satisfy the correct integer relationship which defines a 'pitch'.

My Pitch Detection Algorithm is actually a two-stage process: a) First the ScalePitch is detected ('ScalePitch' has 12 possible pitch values: {E, F, F#, G, G#, A, A#, B, C, C#, D, D#} ) b) and after ScalePitch is determined, then the Octave is calculated by examining all the harmonics for the 4 possible Octave-Candidate notes. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a polyphonic MP3 file. That usually corresponds to the notes of an instrumental solo. Those interested in the C++ source code for my Two-Stage Pitch Detection algorithm might want to start at the Estimate_ScalePitch() function within the SPitchCalc.cpp file at GitHub.com. https://github.com/CreativeDetectors/PitchScope_Player

Below is the image of a Logarithmic DFT (created by my C++ software) for 3 seconds of a guitar solo on a polyphonic mp3 recording. It shows how the harmonics appear for individual notes on a guitar, while playing a solo. For each note on this Logarithmic DFT we can see its multiple harmonics extending vertically, because each harmonic will have the same time-width. After the Octave of the note is determined, then we know the frequency of the Fundamental.

enter image description here

Hickory answered 30/7, 2016 at 18:38 Comment(0)
M
5

I had a similar problem with microphone input on a project I did a few years back - turned out to be due to a DC offset.

Make sure you remove any bias before attempting FFT or whatever other method you are using.

It is also possible that you are running into headroom or clipping problems.

Graphs are the best way to diagnose most problems with audio.

Melanimelania answered 30/8, 2009 at 16:22 Comment(7)
Sorry if i sound stupid, but how do i visualize the result of the FFT / Autocorrelation? Would i take each value in the result array, and plot that and the magnitude of that value?Transact
You can remove DC bias with a high pass filter set to a very low cutoff. I usually go with 25-30 hertz, based on the lowest result from extended string (5- or 6-) bass guitars.Mapel
I suggest running your input through a host and using the free VSTs Fre(a)koscope and s(M)exoscope to see the frequency response and the waveform graphically.Mapel
Is there any other way to do it? VSTs Fre(a)koscope and s(M)exoscope is for windows and im on a mac.Transact
I think there's a plugin adaper that lets you use PC VSTs on Intel Macs. The vast majority of free plugins are PC (which is why I still do music on my PC rather than my Mac). There are some similar Mac tools, but most of them are not free. Try BlueCat's stuff. He has a spectrum analyzer and an oscilloscope. Or search the audio plugin database at kvraudio. Or just ask on a forum there.Mapel
The spectrum analyzer will allow you to compare your FFT results with someone else's, which will let you know if you have a bug. The oscilloscope will let you see if you are clipping.Mapel
Im now using a high pass filter to remove the DC offset (see #1353868), but it's still not working, any idea why?Transact
H
2

Take a look at this sample application:

http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx

I realize the app is in C# and you need C++, and I realize this is .Net/Windows and you're on a mac... But I figured his FFT implementation might be a starting reference point. Try to compare your FFT implementation to his. (His is the iterative, breadth-first version of Cooley-Tukey's FFT). Are they similar?

Also, the "random" behavior you're describing might be because you're grabbing data returned by your sound card directly without assembling the values from the byte-array properly. Did you ask your sound card to sample 16 bit values, and then gave it a byte-array to store the values in? If so, remember that two consecutive bytes in the returned array make up one 16-bit audio sample.

Harlanharland answered 2/9, 2010 at 2:33 Comment(0)
I
0

Here are some open source libraries that implement pitch detection:

  • WORLD : speech analysis/synthesis toolkit. This is especially suitable if your source signal is voice.
  • aubio : audio feature extraction library. Implements many pitch detection algorithms.
  • Pitch detection : a collection of pitch detection algorithms implemented in C++.
  • dywapitchtrack : a high quality pitch detection algorithm.
  • YIN : another implementation of the YIN algorithm in a single C++ source file.
Impoverish answered 24/2, 2020 at 17:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.