Ive been experimenting with the FFT algorithm. I use NAudio along with a working code of the FFT algorithm from the internet. Based on my observations of the performance, the resulting pitch is inaccurate.
What happens is that I have an MIDI (generated from GuitarPro) converted to WAV file (44.1khz, 16-bit, mono) that contains a pitch progression starting from E2 (the lowest guitar note) up to about E6. What results is for the lower notes (around E2-B3) its generally very wrong. But reaching C4 its somewhat correct in that you can already see the proper progression (next note is C#4, then D4, etc.) However, the problem there is that the pitch detected is a half-note lower than the actual pitch (e.g. C4 should be the note but D#4 is displayed).
What do you think may be wrong? I can post the code if necessary. Thanks very much! Im still beginning to grasp the field of DSP.
Edit: Here is a rough scratch of what Im doing
byte[] buffer = new byte[8192];
int bytesRead;
do
{
bytesRead = stream16.Read(buffer, 0, buffer.Length);
} while (bytesRead != 0);
And then: (waveBuffer is simply a class that is there to convert the byte[] into float[] since the function only accepts float[])
public int Read(byte[] buffer, int offset, int bytesRead)
{
int frames = bytesRead / sizeof(float);
float pitch = DetectPitch(waveBuffer.FloatBuffer, frames);
}
And lastly: (Smbpitchfft is the class that has the FFT algo ... i believe theres nothing wrong with it so im not posting it here)
private float DetectPitch(float[] buffer, int inFrames)
{
Func<int, int, float> window = HammingWindow;
if (prevBuffer == null)
{
prevBuffer = new float[inFrames]; //only contains zeroes
}
// double frames since we are combining present and previous buffers
int frames = inFrames * 2;
if (fftBuffer == null)
{
fftBuffer = new float[frames * 2]; // times 2 because it is complex input
}
for (int n = 0; n < frames; n++)
{
if (n < inFrames)
{
fftBuffer[n * 2] = prevBuffer[n] * window(n, frames);
fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
}
else
{
fftBuffer[n * 2] = buffer[n - inFrames] * window(n, frames);
fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
}
}
SmbPitchShift.smbFft(fftBuffer, frames, -1);
}
And for interpreting the result:
float binSize = sampleRate / frames;
int minBin = (int)(82.407 / binSize); //lowest E string on the guitar
int maxBin = (int)(1244.508 / binSize); //highest E string on the guitar
float maxIntensity = 0f;
int maxBinIndex = 0;
for (int bin = minBin; bin <= maxBin; bin++)
{
float real = fftBuffer[bin * 2];
float imaginary = fftBuffer[bin * 2 + 1];
float intensity = real * real + imaginary * imaginary;
if (intensity > maxIntensity)
{
maxIntensity = intensity;
maxBinIndex = bin;
}
}
return binSize * maxBinIndex;
UPDATE (if anyone is still interested):
So, one of the answers below stated that the frequency peak from the FFT is not always equivalent to pitch. I understand that. But I wanted to try something for myself if that was the case (on the assumption that there are times in which the frequency peak IS the resulting pitch). So basically, I got 2 softwares (SpectraPLUS and FFTProperties by DewResearch ; credits to them) that is able to display the frequency-domain for the audio signals.
So here are the results of the frequency peaks in the time domain:
SpectraPLUS
and FFT Properties:
This was done using a test note of A2 (around 110Hz). Upon looking at the images, they have frequency peaks around the range of 102-112 Hz for SpectraPLUS and 108 Hz for FFT Properties. On my code, I get 104Hz (I use 8192 blocks and a samplerate of 44.1khz ... 8192 is then doubled to make it complex input so in the end, I get around 5Hz for binsize, as compared to the 10Hz binsize of SpectraPLUS).
So now Im a bit confused, since on the softwares they seem to return the correct result but on my code, I always get 104Hz (note that I have compared the FFT function that I used with others such as Math.Net and it seems to be correct).
Do you think that the problem may be with my interpretation of the data? Or do the softwares do some other thing before displaying the Frequency-Spectrum? Thanks!
prevBuffer
elements are never set, so the values are always 0. Is it correct behaviour? – Astrophysics