OS X / iOS - Sample rate conversion for a buffer using AudioConverterFillComplexBuffer
Asked Answered
L

1

9

I'm writing a CoreAudio backend for an audio library called XAL. Input buffers can be of various sample rates. I'm using a single audio unit for output. Idea is to convert the buffers and mix them prior to sending them to the audio unit.

Everything works as long as the input buffer has the same properties (sample rate, channel count, etc) as the output audio unit. Hence, the mixing part works.

However, I'm stuck with sample rate and channel count conversion. From what I figured out, this is easiest to do with Audio Converter Services API. I've managed to construct a converter; the idea is that the output format is the same as the output unit format, but possibly adjusted for purposes of the converter.

Audio converter is successfully constructed, but upon calling AudioConverterFillComplexBuffer(), I get output status error -50.

I'd love if I could get another set of eyeballs on this code. Problem is probably somewhere below AudioConverterNew(). Variable stream contains incoming (and outgoing) buffer data, and streamSize contains byte-size of incoming (and outgoing) buffer data.

What did I do wrong?

void CoreAudio_AudioManager::_convertStream(Buffer* buffer, unsigned char** stream, int *streamSize)
{
    if (buffer->getBitsPerSample() != unitDescription.mBitsPerChannel || 
        buffer->getChannels() != unitDescription.mChannelsPerFrame || 
        buffer->getSamplingRate() != unitDescription.mSampleRate)
    {
        printf("INPUT STREAM SIZE: %d\n", *streamSize);
        // describe the input format's description
        AudioStreamBasicDescription inputDescription;
        memset(&inputDescription, 0, sizeof(inputDescription));
        inputDescription.mFormatID = kAudioFormatLinearPCM;
        inputDescription.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger;
        inputDescription.mChannelsPerFrame = buffer->getChannels();
        inputDescription.mSampleRate = buffer->getSamplingRate();
        inputDescription.mBitsPerChannel = buffer->getBitsPerSample();
        inputDescription.mBytesPerFrame = (inputDescription.mBitsPerChannel * inputDescription.mChannelsPerFrame) / 8;
        inputDescription.mFramesPerPacket = 1; //*streamSize / inputDescription.mBytesPerFrame;
        inputDescription.mBytesPerPacket = inputDescription.mBytesPerFrame * inputDescription.mFramesPerPacket;
        printf("INPUT : %lu bytes per packet for sample rate %g, channels %d\n", inputDescription.mBytesPerPacket, inputDescription.mSampleRate, inputDescription.mChannelsPerFrame);

        // copy conversion output format's description from the
        // output audio unit's description.
        // then adjust framesPerPacket to match the input we'll be passing.

        // framecount of our input stream is based on the input bytecount.
        // output stream will have same number of frames, but different
        // number of bytes.
        AudioStreamBasicDescription outputDescription = unitDescription;
        outputDescription.mFramesPerPacket = 1; //inputDescription.mFramesPerPacket;
        outputDescription.mBytesPerPacket = outputDescription.mBytesPerFrame * outputDescription.mFramesPerPacket;
        printf("OUTPUT : %lu bytes per packet for sample rate %g, channels %d\n", outputDescription.mBytesPerPacket, outputDescription.mSampleRate, outputDescription.mChannelsPerFrame);

        // create an audio converter
        AudioConverterRef audioConverter;
        OSStatus acCreationResult = AudioConverterNew(&inputDescription, &outputDescription, &audioConverter);
        printf("Created audio converter %p (status: %d)\n", audioConverter, acCreationResult);
        if(!audioConverter)
        {
            // bail out
            free(*stream);
            *streamSize = 0;
            *stream = (unsigned char*)malloc(0);
            return;
        }

        // calculate number of bytes required for output of input stream.
        // allocate buffer of adequate size.
        UInt32 outputBytes = outputDescription.mBytesPerPacket * (*streamSize / inputDescription.mBytesPerFrame); // outputDescription.mFramesPerPacket * outputDescription.mBytesPerFrame;
        unsigned char *outputBuffer = (unsigned char*)malloc(outputBytes);
        memset(outputBuffer, 0, outputBytes);
        printf("OUTPUT BYTES : %d\n", outputBytes);

        // describe input data we'll pass into converter
        AudioBuffer inputBuffer;
        inputBuffer.mNumberChannels = inputDescription.mChannelsPerFrame;
        inputBuffer.mDataByteSize = *streamSize;
        inputBuffer.mData = *stream;

        // describe output data buffers into which we can receive data.
        AudioBufferList outputBufferList;
        outputBufferList.mNumberBuffers = 1;
        outputBufferList.mBuffers[0].mNumberChannels = outputDescription.mChannelsPerFrame;
        outputBufferList.mBuffers[0].mDataByteSize = outputBytes;
        outputBufferList.mBuffers[0].mData = outputBuffer;

        // set output data packet size
        UInt32 outputDataPacketSize = outputDescription.mBytesPerPacket;

        // convert
        OSStatus result = AudioConverterFillComplexBuffer(audioConverter, /* AudioConverterRef inAudioConverter */
                                                          CoreAudio_AudioManager::_converterComplexInputDataProc, /* AudioConverterComplexInputDataProc inInputDataProc */
                                                          &inputBuffer, /* void *inInputDataProcUserData */
                                                          &outputDataPacketSize, /* UInt32 *ioOutputDataPacketSize */
                                                          &outputBufferList, /* AudioBufferList *outOutputData */
                                                          NULL /* AudioStreamPacketDescription *outPacketDescription */
                                                          );
        printf("Result: %d wheee\n", result);

        // change "stream" to describe our output buffer.
        // even if error occured, we'd rather have silence than unconverted audio.
        free(*stream);
        *stream = outputBuffer;
        *streamSize = outputBytes;

        // dispose of the audio converter
        AudioConverterDispose(audioConverter);
    }
}


OSStatus CoreAudio_AudioManager::_converterComplexInputDataProc(AudioConverterRef inAudioConverter,
                                                                UInt32* ioNumberDataPackets,
                                                                AudioBufferList* ioData,
                                                                AudioStreamPacketDescription** ioDataPacketDescription,
                                                                void* inUserData)
{
    printf("Converter\n");
    if(*ioNumberDataPackets != 1)
    {
        xal::log("_converterComplexInputDataProc cannot provide input data; invalid number of packets requested");
        *ioNumberDataPackets = 0;
        ioData->mNumberBuffers = 0;
        return -50;
    }

    *ioNumberDataPackets = 1;
    ioData->mNumberBuffers = 1;
    ioData->mBuffers[0] = *(AudioBuffer*)inUserData;

    *ioDataPacketDescription = NULL;

    return 0;
}
Latreshia answered 7/7, 2011 at 12:57 Comment(0)
L
13

Working code for Core Audio sample rate conversion and channel count conversion, using Audio Converter Services (now available as a part of the BSD-licensed XAL audio library):

void CoreAudio_AudioManager::_convertStream(Buffer* buffer, unsigned char** stream, int *streamSize)
{
    if (buffer->getBitsPerSample() != unitDescription.mBitsPerChannel || 
        buffer->getChannels() != unitDescription.mChannelsPerFrame || 
        buffer->getSamplingRate() != unitDescription.mSampleRate)
    {
        // describe the input format's description
        AudioStreamBasicDescription inputDescription;
        memset(&inputDescription, 0, sizeof(inputDescription));
        inputDescription.mFormatID = kAudioFormatLinearPCM;
        inputDescription.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger;
        inputDescription.mChannelsPerFrame = buffer->getChannels();
        inputDescription.mSampleRate = buffer->getSamplingRate();
        inputDescription.mBitsPerChannel = buffer->getBitsPerSample();
        inputDescription.mBytesPerFrame = (inputDescription.mBitsPerChannel * inputDescription.mChannelsPerFrame) / 8;
        inputDescription.mFramesPerPacket = 1; //*streamSize / inputDescription.mBytesPerFrame;
        inputDescription.mBytesPerPacket = inputDescription.mBytesPerFrame * inputDescription.mFramesPerPacket;

        // copy conversion output format's description from the
        // output audio unit's description.
        // then adjust framesPerPacket to match the input we'll be passing.

        // framecount of our input stream is based on the input bytecount.
        // output stream will have same number of frames, but different
        // number of bytes.
        AudioStreamBasicDescription outputDescription = unitDescription;
        outputDescription.mFramesPerPacket = 1; //inputDescription.mFramesPerPacket;
        outputDescription.mBytesPerPacket = outputDescription.mBytesPerFrame * outputDescription.mFramesPerPacket;

        // create an audio converter
        AudioConverterRef audioConverter;
        OSStatus acCreationResult = AudioConverterNew(&inputDescription, &outputDescription, &audioConverter);
        if(!audioConverter)
        {
            // bail out
            free(*stream);
            *streamSize = 0;
            *stream = (unsigned char*)malloc(0);
            return;
        }

        // calculate number of bytes required for output of input stream.
        // allocate buffer of adequate size.
        UInt32 outputBytes = outputDescription.mBytesPerPacket * (*streamSize / inputDescription.mBytesPerPacket); // outputDescription.mFramesPerPacket * outputDescription.mBytesPerFrame;
        unsigned char *outputBuffer = (unsigned char*)malloc(outputBytes);
        memset(outputBuffer, 0, outputBytes);

        // describe input data we'll pass into converter
        AudioBuffer inputBuffer;
        inputBuffer.mNumberChannels = inputDescription.mChannelsPerFrame;
        inputBuffer.mDataByteSize = *streamSize;
        inputBuffer.mData = *stream;

        // describe output data buffers into which we can receive data.
        AudioBufferList outputBufferList;
        outputBufferList.mNumberBuffers = 1;
        outputBufferList.mBuffers[0].mNumberChannels = outputDescription.mChannelsPerFrame;
        outputBufferList.mBuffers[0].mDataByteSize = outputBytes;
        outputBufferList.mBuffers[0].mData = outputBuffer;

        // set output data packet size
        UInt32 outputDataPacketSize = outputBytes / outputDescription.mBytesPerPacket;

        // fill class members with data that we'll pass into
        // the InputDataProc
        _converter_currentBuffer = &inputBuffer;
        _converter_currentInputDescription = inputDescription;

        // convert
        OSStatus result = AudioConverterFillComplexBuffer(audioConverter, /* AudioConverterRef inAudioConverter */
                                                          CoreAudio_AudioManager::_converterComplexInputDataProc, /* AudioConverterComplexInputDataProc inInputDataProc */
                                                          this, /* void *inInputDataProcUserData */
                                                          &outputDataPacketSize, /* UInt32 *ioOutputDataPacketSize */
                                                          &outputBufferList, /* AudioBufferList *outOutputData */
                                                          NULL /* AudioStreamPacketDescription *outPacketDescription */
                                                          );

        // change "stream" to describe our output buffer.
        // even if error occured, we'd rather have silence than unconverted audio.
        free(*stream);
        *stream = outputBuffer;
        *streamSize = outputBytes;

        // dispose of the audio converter
        AudioConverterDispose(audioConverter);
    }
}


OSStatus CoreAudio_AudioManager::_converterComplexInputDataProc(AudioConverterRef inAudioConverter,
                                                                UInt32* ioNumberDataPackets,
                                                                AudioBufferList* ioData,
                                                                AudioStreamPacketDescription** ioDataPacketDescription,
                                                                void* inUserData)
{
    if(ioDataPacketDescription)
    {
        xal::log("_converterComplexInputDataProc cannot provide input data; it doesn't know how to provide packet descriptions");
        *ioDataPacketDescription = NULL;
        *ioNumberDataPackets = 0;
        ioData->mNumberBuffers = 0;
        return 501;
    }

    CoreAudio_AudioManager *self = (CoreAudio_AudioManager*)inUserData;

    ioData->mNumberBuffers = 1;
    ioData->mBuffers[0] = *(self->_converter_currentBuffer);

    *ioNumberDataPackets = ioData->mBuffers[0].mDataByteSize / self->_converter_currentInputDescription.mBytesPerPacket;
    return 0;
}

In the header, as part of the CoreAudio_AudioManager class, here are relevant instance variables:

    AudioStreamBasicDescription unitDescription;
    AudioBuffer *_converter_currentBuffer;
    AudioStreamBasicDescription _converter_currentInputDescription;

A few months later, I'm looking at this and I've realized that I didn't document the changes.

If you are interested in what the changes were:

  • look at the callback function CoreAudio_AudioManager::_converterComplexInputDataProc
  • one has to properly specify the number of output packets into ioNumberDataPackets
  • this has required introduction of new instance variables to hold both the buffer (the previous inUserData) and the input description (used to calculate the number of packets to be fed into Core Audio's converter)
  • this calculation of "output" packets (those fed into the converter) is done based on amount of data that our callback received, and the number of bytes per packet that the input format contains

Hopefully this edit will help a future reader (myself included)!

Latreshia answered 8/7, 2011 at 12:3 Comment(5)
In your code, I saw you have this unitDescription.mSampleRate. I assumed this was your desired output sample rate. But I can't seem to find where you tell outputDescription the desired sample rate?Heartache
This library does not need nor serve for arbitrary conversion; the only reason it's even doing conversion is so it can play audio on the "output audio unit" -- i.e. speakers or headphones, for the most part. So when doing conversion, it's copying entirety of the output audio description from the output audio unit, and only modifying mFramesPerPacket. If you have a need to get a different rate, try changing appropriate field in the outputDescription (which is an AudioStreamBasicDescription-typed struct). And be careful; I've had a terrible time with this in later experimentation.Brandybrandyn
but what happens when the bytes per packet the input format contains = 0, as is often the case with VBR formats? you can't divide by zero unless you are chuck norris! :pEustis
i got it.. in the book learning core audio sample code ch6 they address this issue by taking the maxPacketValue the converter can offer instead: CheckResult(AudioConverterGetProperty(streamer->audioConverter, kAudioConverterPropertyMaximumOutputPacketSize, &size, &sizePerPacket);Eustis
Can AudioConverterServices be used even if I just simply wanted to convert 2 mono streams of same length and sample rate into a single stereo file? It doesn't seem clear how I can pass in two input streams into a single buffer and then have it become an output buffer that will be a stereo.Botheration

© 2022 - 2024 — McMap. All rights reserved.