Android 2.3 Visualizer - Trouble understanding getFft()
Asked Answered
T

2

15

First time here so sorry in advance for any butchered formatting.

So I am completely new to DSP so I have only a very general understanding of the Fourier Transform. I am trying to build a visualizer app for Android SDK 9, which includes a Visualizer class in android.media.audiofx.Visualizer http://developer.android.com/reference/android/media/audiofx/Visualizer.html

The javadoc for the method getFft(), which is what I am using states:

"Returns a frequency capture of currently playing audio content. The capture is a 8-bit magnitude FFT. Note that the size of the FFT is half of the specified capture size but both sides of the spectrum are returned yielding in a number of bytes equal to the capture size."

First of all, what does "both sides of the spectrum" mean? How does this output differ from a standard FFT?

Here is some sample output of the byte array, getFft() was given 124 points to keep it simple and I grabbed the first 31 bins. Here are the magnitudes of the first 31 bins:

{123, -2, -23, -3, 6, -16, 15, -10, -8, -12, 9, -9, 17, -6, -18, -22, -8, 4, -5, -2, 10, -3, -11, 3, -4, -11, -8, 15, 16, 11, -12, 12}

Any help or explanation would be greatly appreciated!

Edit: So after staring at a bunch of graphs it looks like part of my problem is Google does not specify what unit is being used. Almost all other measurements are done in mHz, would it be fair to assume that the FTT output is also in mHz? Is there a place where I can see the source code of the Visualizer class so maybe I can figure out what the hell is actually going on under the hood?

I went ahead and grabbed all of the output of getFft()

93, -2, -28, -16, -21, 19, 44, -16, 3, 16, -9, -4, 0, -2, 21, 16, -3, 1, 2, 4, -3, 5, 5, 10, 6, 4, -9, 7, -2, -1, 2, 11, -1, 5, -8, -2, -1, 4, -5, 5, 1, 3, -6, -1, -5, 0, 0, 0, -3, 5, -4, -6, -2, -2, -1, 2, -3, 0, 1, -3, -4, -3, 1, 1, 0, -2, -1, -1, 0, -5, 0, 4, -1, 1, 1, -1, 1, -1, -3, 2, 1, 2, -2, 1, 0, -1, -2, 2, -3, 4, -2, -2, 0, 1, -4, 0, -4, 2, -1, 0, -3, -1, -1, -1, -5, 2, -2, -2, 0, -3, -2, 1, -5, -2, 0, 0, 0, -2, -2, -1, -1, -1, -2, 0, 3, -3, -1, 0

So if I understand this correctly, my output here should be from -N to 0 to N. -N to 0 should look just like 0 to N. But when I look at these amplitudes, I don't see any mirrored data. Google seems to indicate that the output should be from 0 to N just on both sides of the spectrum. So I should be able to take the data from (output.length-1)/2 to output.length-1. The negative amplitudes are moving faster than the sample rate and the positive amplitudes are moving slower than the sample rate. Did I understand this correctly?

Terranceterrane answered 18/1, 2011 at 4:32 Comment(8)
Thank you so much for all this information, I feel kind of bad that I can't do anything more than upvote your comments.Terranceterrane
Edited original post with a complete set of output, the data does not seem to be symmetric even though it is supposed to be from both sides of the spectrum...Terranceterrane
Yes, and what I did was use getCaptureSizeRange()[0] which returns the lowest capture size in the range.Terranceterrane
That would probably make sense if they were assuming I might want to do more complicated things with the data. I will try the above out and see if it works any better.Terranceterrane
Here is the output: 11, 0, 0, 0, 6, 6, 1, 4, 0, 1, 0, 4, 0, 0, 2, 0, 1, 3, 2, 0, 1, 2, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 1, 0, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1Terranceterrane
Does the magnitude have any relation to the volume of the sound per bin? If it does, then the above is not working.Terranceterrane
I ran some test mp3s through it which move from 16Hz to 20kHz and did not see any correlation between the raw data and sound nor the modified data and sound. I am beginning to think this is a lost cause.Terranceterrane
By default (now, in April 2019) sound does not affect the magnitudes by default: developer.android.com/reference/android/media/audiofx/…Gormand
R
8

The frequency at FFT output sample k is given by:

Fk = k * Fs / N,    k = 0,1,...,N-1 

where

  • Fs is the sampling frequency of the time series input
  • N is the number of samples used to compute the FFT

The two sides of the spectrum refers to the positive and negative frequencies in the output of the FFT. The FFT forces the frequency output to be periodic with a period of Fs. If you look at the FFT output, it covers the frequencies from 0 to Fs. It is often advantageous to view the spectrum over the range of -0.5*Fs to 0.5*Fs instead by shifting the FFT output from 0.5*Fs -> Fs to -0.5*Fs -> 0 since they are equal because of the periodicity.

For real-valued signals, like the ones you have in audio processing, the negative frequency output will be a mirror image of the positive frequencies. Because of this, often only one side of the spectrum is used when analyzing real signals.

Another important point is the significance of 0.5*Fs which is known as the Nyquist Frequency. A signal can only accurately represent frequencies up to the Nyquist frequency and anything above it will be aliased (folded) back onto the spectrum causing distortion.

So really all you should worry about for visualization purposes are the FFT output samples corresponding to the range of frequencies from 0 to Fs/2 since those are the meaningful samples for a real signal with sampling rate Fs.

Rosemaria answered 19/1, 2011 at 4:55 Comment(2)
Thank you so much for this! Super succinct and what I have been hunting for, for a week now!Terranceterrane
So, in my case, an output of FFT magnitudes that are only 0 and 1.414214 are essentially giving me just the minimums and maximums?Gormand
P
9

In case it helps anyone, I've created a Visualizer which takes the output from the MediaPlayer and displays a visualization. It works with both normal waveform and FFT data:

https://github.com/felixpalmer/android-visualizer

It includes code for converting the output of getFft() into something visually meaningful.

Plexiglas answered 14/12, 2011 at 1:9 Comment(2)
Thanks for sharing this, very helpful! About how you handle the FFT's...Is there any rhyme or reason to how you calculate the dB value, or did you just arbitrarily scale it to something that looked good? I've seen a tutorial about FFT's that says to normalize each frequency based on the highest amplitude measured, but it seems to me like that would make it look like the high notes are blaring. Maybe these dB values could be scaled based on A-weighting to get a realistic scale on each frequency.Melodeemelodeon
TBH, it's been a while - but I think I just scaled it to something that looked good, as you suggest. If you really cared, then I guess you could just re-scale each time you hit a new maximum, perhaps even including a time-based decay to the maximum so that one loud section doesn't distort the restPlexiglas
R
8

The frequency at FFT output sample k is given by:

Fk = k * Fs / N,    k = 0,1,...,N-1 

where

  • Fs is the sampling frequency of the time series input
  • N is the number of samples used to compute the FFT

The two sides of the spectrum refers to the positive and negative frequencies in the output of the FFT. The FFT forces the frequency output to be periodic with a period of Fs. If you look at the FFT output, it covers the frequencies from 0 to Fs. It is often advantageous to view the spectrum over the range of -0.5*Fs to 0.5*Fs instead by shifting the FFT output from 0.5*Fs -> Fs to -0.5*Fs -> 0 since they are equal because of the periodicity.

For real-valued signals, like the ones you have in audio processing, the negative frequency output will be a mirror image of the positive frequencies. Because of this, often only one side of the spectrum is used when analyzing real signals.

Another important point is the significance of 0.5*Fs which is known as the Nyquist Frequency. A signal can only accurately represent frequencies up to the Nyquist frequency and anything above it will be aliased (folded) back onto the spectrum causing distortion.

So really all you should worry about for visualization purposes are the FFT output samples corresponding to the range of frequencies from 0 to Fs/2 since those are the meaningful samples for a real signal with sampling rate Fs.

Rosemaria answered 19/1, 2011 at 4:55 Comment(2)
Thank you so much for this! Super succinct and what I have been hunting for, for a week now!Terranceterrane
So, in my case, an output of FFT magnitudes that are only 0 and 1.414214 are essentially giving me just the minimums and maximums?Gormand

© 2022 - 2024 — McMap. All rights reserved.