How do I use CoreAudio's AudioConverter to encode AAC in real-time?

Asked 16/5, 2015 at 2:21 Answered 5/1, 2017 at 5:1

Solved ios audio core-audio aac audio-converter

All the sample code I can find that uses AudioConverterRef focuses on use cases where I have all the data up-front (such as converting a file on disk). They commonly call AudioConverterFillComplexBuffer with the PCM to be converted as the inInputDataProcUserData and just fill it in in the callback. (Is that really how it's supposed to be used? Why does it need a callback, then?) For my use case, I'm trying to stream aac audio from the microphone, so I have no file, and my PCM buffer is being filled in in real time.

Since I don't have all the data up-front, I've tried doing *ioNumberDataPackets = 0 in the callback once my input data is out, but that just puts the AudioConverter in a dead state where it needs to be AudioConverterReset()ted, and I don't get any data out of it.

One approach I've seen suggested online is to return an error from the callback if the data I have stored is too small, and just try again once I have more data, but that seems like such a waste of resources that I can't bring myself to even try it.

Do I really need to do the "retry until my input buffer is big enough", or is there a better way?

Hoarfrost answered 16/5, 2015 at 2:21 Comment(0)

AudioConverterFillComplexBuffer does not actually mean "fill the encoder with my input buffer that I have here". It means "fill this output buffer here with encoded data from the encoder". With this perspective, the callback suddenly makes sense -- it is used to fetch source data to satisfy the "fill this output buffer for me" request. Maybe this is obvious to others, but it took me a long time to understand this (and from all the AudioConverter sample code I see floating around where people send input data through inInputDataProcUserData, I'm guessing I'm not the only one).

The AudioConverterFillComplexBuffer call is blocking, and is expecting you to deliver data to it synchronously from the callback. If you are encoding in real time, you will thus need to call FillComplexBuffer on a separate thread that you set up yourself. In the callback, you can then check for available input data, and if it is not available, you need to block on a semaphore. Using an NSCondition, the encoder thread would then look something like this:

- (void)startEncoder
{
    OSStatus creationStatus = AudioConverterNew(&_fromFormat, &_toFormat, &_converter);

    _running = YES;
    _condition = [[NSCondition alloc] init];
    [self performSelectorInBackground:@selector(_encoderThread) withObject:nil];
}

- (void)_encoderThread
{
    while(_running) {
        // Make quarter-second buffers.
        size_t bufferSize = (_outputBitrate/8) * 0.25;
        NSMutableData *outAudioBuffer = [NSMutableData dataWithLength:bufferSize];
        AudioBufferList outAudioBufferList;
        outAudioBufferList.mNumberBuffers = 1;
        outAudioBufferList.mBuffers[0].mNumberChannels = _toFormat.mChannelsPerFrame;
        outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)bufferSize;
        outAudioBufferList.mBuffers[0].mData = [outAudioBuffer mutableBytes];

        UInt32 ioOutputDataPacketSize = 1;

        _currentPresentationTime = kCMTimeInvalid; // you need to fill this in during FillComplexBuffer
        const OSStatus conversionResult = AudioConverterFillComplexBuffer(_converter, FillBufferTrampoline, (__bridge void*)self, &ioOutputDataPacketSize, &outAudioBufferList, NULL);

        // here I convert the AudioBufferList into a CMSampleBuffer, which I've omitted for brevity.
        // Ping me if you need it.
        [self.delegate encoder:self encodedSampleBuffer:outSampleBuffer];
    }
}

And the callback could look like this: (note that I normally use this trampoline to immediately forward to a method on my instance (by forwarding my instance in inUserData; this step is omitted for brevity)):

static OSStatus FillBufferTrampoline(AudioConverterRef               inAudioConverter,
                                        UInt32*                         ioNumberDataPackets,
                                        AudioBufferList*                ioData,
                                        AudioStreamPacketDescription**  outDataPacketDescription,
                                        void*                           inUserData)
{
    [_condition lock];

    UInt32 countOfPacketsWritten = 0;

    while (true) {
        // If the condition fires and we have shut down the encoder, just pretend like we have written 0 bytes and are done.
        if(!_running) break;

        // Out of input data? Wait on the condition.
        if(_inputBuffer.length == 0) {
            [_condition wait];
            continue;
        }

        // We have data! Fill ioData from your _inputBuffer here.
        // Also save the input buffer's start presentationTime here.

        // Exit out of the loop, since we're done waiting for data
        break;
    }

    [_condition unlock];

        // 2. Set ioNumberDataPackets to the amount of data remaining


    // if running is false, this will be 0, indicating EndOfStream
    *ioNumberDataPackets = countOfPacketsWritten;

    return noErr;
}

And for completeness, here's how you would then feed this encoder with data, and how to shut it down properly:

- (void)appendSampleBuffer:(CMSampleBufferRef)sampleBuffer
{
    [_condition lock];
    // Convert sampleBuffer and put it into _inputBuffer here
    [_condition broadcast];
    [_condition unlock];
}

- (void)stopEncoding
{
    [_condition lock];
    _running = NO;
    [_condition broadcast];
    [_condition unlock];
}

Hoarfrost answered 16/5, 2015 at 2:21 Comment(3)

I have some trouble with fill iodate with _inputBuffer and set ioNumberDataPackets, could you please fill up the code? Some questions: Do we need to set ioData.mNumberBuffers to 1? Do we need do fill up all the data from _inputBuffer to ioData.mBuffers[0]? How can we calculate the ioNumberDataPackets? or just set it to 1? What do you mean "set ioNumberDataPackets to the amount of data remaining?" while the document say "on exit, the number of packets of audio data actually being provided for input" ? – Fredericton 16/2, 2017 at 4:34

@Hoarfrost I'm currently struggling with converting AudioBufferList into a CMSampleBuffer, which you've omitted here. Can you please share how you did that? Thanks! – Basra 17/12, 2019 at 23:56

Thank you! My initial approach was to put a semaphore to wait for incoming data atop the FillComplexBuffer (output) loop. But that's the wrong place for it. Moving the wait to a loop in the FillBuffer (input) callback puts the "backpressure" in exactly the right place. Your Q&A does a great job explaining why. – Stenophyllous 18/3, 2020 at 17:16

For future reference, there is a way way easier option.

The CoreAudio header's state:

If the callback returns an error, it must return zero packets of data. AudioConverterFillComplexBuffer will stop producing output and return whatever output has already been produced to its caller, along with the error code. This mechanism can be used when an input proc has temporarily run out of data, but has not yet reached end of stream.

So, do exactly that. Instead of returning noErr with *ioNumberDataPackets = 0, return any error (just make one up, I used -1), and the already converted data will be returned, while the Audio Converter stays alive and does not need to be reset.

Fireeater answered 5/1, 2017 at 5:1 Comment(2)

I tried that; when I tried this approach, AudioConverter would give me a 12-byte buffer with just an mpeg header and then refuse to take more data. I assumed this to mean that AC needs enough data to emit full aac frames to work. – Hoarfrost 6/1, 2017 at 13:37

Ahhh. It could be. I'm working with just PCM output and it's working great for me. AudioConverter does maintain it's own internal buffering so it's odd this wouldn't work for AAC too. But the API does state it has to output something so maybe that put's them in a weird place. – Fireeater 8/1, 2017 at 22:12

Recommended topics

Hot tags