uwp AudioGraph audio processing
Asked Answered
L

2

8

I am working on a winodws IoT project that controls a led strip based on an audio input. Now do I have some code that gets the audio in and writes it to a buffer with the AudioGraph API, but I don't know how I can process the audio to some usefull data.

my code so far:

private async void MainPage_Loaded(object sender, RoutedEventArgs eventArgs)
{
        try
        {
            // Initialize the led strip
            //await this.pixelStrip.Begin();

            sampleAggregator.FftCalculated += new EventHandler<FftEventArgs>(FftCalculated);
            sampleAggregator.PerformFFT = true;

            // Create graph
            AudioGraphSettings settings = new AudioGraphSettings(AudioRenderCategory.Media);
            settings.DesiredSamplesPerQuantum = fftLength;
            settings.DesiredRenderDeviceAudioProcessing = Windows.Media.AudioProcessing.Default;
            settings.QuantumSizeSelectionMode = QuantumSizeSelectionMode.ClosestToDesired;

            CreateAudioGraphResult result = await AudioGraph.CreateAsync(settings);
            if (result.Status != AudioGraphCreationStatus.Success)
            {
                // Cannot create graph
                return;
            }
            graph = result.Graph;

            // Create a device input node using the default audio input device
            CreateAudioDeviceInputNodeResult deviceInputNodeResult = await graph.CreateDeviceInputNodeAsync(MediaCategory.Other);

            if (deviceInputNodeResult.Status != AudioDeviceNodeCreationStatus.Success)
            {
                return;
            }

            deviceInputNode = deviceInputNodeResult.DeviceInputNode;

            frameOutputNode = graph.CreateFrameOutputNode();
            frameOutputNode.Start();
            graph.QuantumProcessed += AudioGraph_QuantumProcessed;

            // Because we are using lowest latency setting, we need to handle device disconnection errors
            graph.UnrecoverableErrorOccurred += Graph_UnrecoverableErrorOccurred;

            graph.Start();
        }
        catch (Exception e)
        {
            Debug.WriteLine(e.ToString());
        }
    }

    private void AudioGraph_QuantumProcessed(AudioGraph sender, object args)
    {
        AudioFrame frame = frameOutputNode.GetFrame();
        ProcessFrameOutput(frame);
    }

    unsafe private void ProcessFrameOutput(AudioFrame frame)
    {
        using (AudioBuffer buffer = frame.LockBuffer(AudioBufferAccessMode.Write))
        using (IMemoryBufferReference reference = buffer.CreateReference())
        {
            byte* dataInBytes;
            uint capacityInBytes;
            float* dataInFloat;

            // Get the buffer from the AudioFrame
            ((IMemoryBufferByteAccess)reference).GetBuffer(out dataInBytes, out capacityInBytes);

            dataInFloat = (float*)dataInBytes;


        }
    }

So I end with my buffer as a float. But how can i change this to usefull data that makes it possible to create something like a spectrum analyzer?

Edit:

Maybe I have to make this question less specific for the audiograph. I use an API to get my audio input. The data I get from the API is a byte* and I can cast it to a float* How can I change it from the byte* or the float* to some other data that I can use to create some color codes.

I thaught about doing some FFT analysis on the float* to get 164 leds*3(rgb) = 492 bins. And process this data further to get some values between 0 and 255.

So how can I process this float* or byte* to get this usefull data? Or how do I start?

Lamdin answered 10/1, 2016 at 14:45 Comment(1)
You may take a look at github.com/filoe/cscore, there is a sample included (see image down below)Leda
P
12

That data is interleaved IEEE float, so it alternates channel data as you step through the array, and the data range for each sample is from -1 to 1. For example, a mono signal only has one channel, so it won't interleave data at all; but a stereo signal has two channels of audio, and so:

dataInFloat[0]

is the first sample of data from the left channel and

dataInFloat[1]

is the first sample of data from the right channel. Then,

dataInFloat[2]

is the second sample of data from the left channel. and they just keep going back and forth. All the other data you'll end up caring about is in windows.media.mediaproperties.audioencodingproperties

So, just knowing this, you (essentially) can immediately get the overall volume of the signal directly from this data by looking at the absolute value of each sample. You'll definitely want to average it out over some amount of time. You can even just attach EQ effects to different nodes, and make seperate Low, Mids, and High analyzer nodes and never even get into FFT stuff. BUT WHAT FUN IS THAT? (it's actually still fun)

And then, yeah, to get your complex harmonic data and make a truly sweet visualizer, you want to do an FFT on it. People enjoy using AForge for learning scenarios, like yours. See Sources/Imaging/ComplexImage.cs for usage, Sources/Math/FourierTransform.cs for implemenation

Then you can easily get your classic bin data and do the classic music visualizer stuff or get more creative or whatever! technology is awesome!

Persimmon answered 27/1, 2016 at 0:2 Comment(5)
Thanks! I still have some questions. I have most of the time a buffer length of 3840 in a frame time of 0.01 seconds, so this means (3840/sizeof(float))/2 that my left and right chanels have a length of 480 floats. Is this right? The encodingproperties of my graph are bitrate of 3072000, bits/sample 32, samplerate 48000Lamdin
You are correct! Note that the range of the data is [-1,+1], so if you look at the average of the ABSOLUTE value of that data you'll get a rough estimate of the volume. (I'm revising my post above with this info too) But really you should hand that off to the FFT to get true data values back, but the amplitude trick (average absolute float value) works fine for a quick-n-dirty analysis and is much less CPU intensive. I do it all the time if i only have one external thing i want to trigger from music (one light, one motor, phone vibrate, etc...)Persimmon
Okay, Nice! So now I create a complex array with the real part the values of the left audio (Is something like a hammingwindow needed?) for example and the complex part will be 0 (Always 0 for audio right?). And if that array is the length of 2^n then I throw it through the FFT and that will return the frequency vs time. And what I then expect is that the second half is the same as the first part. But it isn't :( So, is it right what i'm thinking and doing?Lamdin
Sorry, that's just out of my knowledge base. I will say to be very mindful of data formats when you do audio work like this. PCM vs IEEE, etc... be certain you understand the data handoff points between functions and that are in the expected formatPersimmon
Ah okay. Thanks any way. You helped me to the right direction :)Lamdin
C
0
  dataInFloat = (float*)dataInBytes;
  float max = 0;
   for (int i = 0; i < graph.SamplesPerQuantum; i++)
                {
                    max = Math.Max(Math.Abs(dataInFloat[i]), max);

                }

                finalLevel = max;
                Debug.WriteLine(max);
Canvass answered 22/9, 2016 at 16:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.