We have an app that is mostly a UIWebView for a heavily javascript based web app. The requirement we have come up against is being able to play audio to the user and then record the user, play back that recording for confirmation and then send the audio to a server. This works in Chrome, Android and other platforms because that ability is built into the browser. No native code required.
Sadly, the iOS (iOS 8/9) web view lacks the ability to record audio.
The first workaround we tried was recording the audio with an AudioQueue and passing the data (LinearPCM 16bit) to a JS AudioNode so the web app could process the iOS audio exactly the same way as other platforms. This got to a point where we could pass the audio to JS, but the app would eventually crash with a bad memory access error or the javascript side just could not keep up with the data being sent.
The next idea was to save the audio recording to a file and send partial audio data to JS for visual feedback, a basic audio visualizer displayed during recording only.
The audio records and plays back fine to a WAVE file as Linear PCM signed 16bit. The JS visualizer is where we are stuck. It is expecting Linear PCM unsigned 8bit so I added a conversion step that may be wrong. I've tried several different ways, mostly found online, and have not found one that works which makes me think something else is wrong or missing before we even get to the conversion step.
Since I don't know what or where exactly the problem is I'll dump the code below for the audio recording and playback classes. Any suggestions would be welcome to resolve, or bypass somehow, this issue.
One idea I had was to record in a different format (CAF) using different format flags. Looking at the values that are produced, non of the signed 16bit ints come even close to the max value. I rarely see anything above +/-1000. Is that because of the kLinearPCMFormatFlagIsPacked flag in the AudioStreamPacketDescription? Removing that flag cuases the audio file to not be created because of an invalid format. Maybe switching to CAF would work but we need to convert to WAVE before sending the audio back to our server.
Or maybe my conversion from signed 16bit to unsigned 8bit is wrong? I have also tried bitshifting and casting. The only difference is, with this conversion all the audio values get compressed to between 125 and 130. Bit shifting and casting change that to 0-5 and 250-255. That doesn't really solve any problems on the JS side.
The next step would be, instead of passing the data to JS run it through a FFT function and produce values to be used directly by JS for the audio visualizer. I'd rather figure out if I have done something obviously wrong before going that direction.
AQRecorder.h - EDIT: updated audio format to LinearPCM 32bit Float.
#ifndef AQRecorder_h
#define AQRecorder_h
#import <AudioToolbox/AudioToolbox.h>
#define NUM_BUFFERS 3
#define AUDIO_DATA_TYPE_FORMAT float
#define JS_AUDIO_DATA_SIZE 32
@interface AQRecorder : NSObject {
AudioStreamBasicDescription mDataFormat;
AudioQueueRef mQueue;
AudioQueueBufferRef mBuffers[ NUM_BUFFERS ];
AudioFileID mAudioFile;
UInt32 bufferByteSize;
SInt64 mCurrentPacket;
bool mIsRunning;
}
- (void)setupAudioFormat;
- (void)startRecording;
- (void)stopRecording;
- (void)processSamplesForJS:(UInt32)audioDataBytesCapacity audioData:(void *)audioData;
- (Boolean)isRunning;
@end
#endif
AQRecorder.m - EDIT: updated audio format to LinearPCM 32bit Float. Added FFT step in processSamplesForJS instead of sending audio data directly.
#import <AVFoundation/AVFoundation.h>
#import "AQRecorder.h"
#import "JSMonitor.h"
@implementation AQRecorder
void AudioQueueCallback(void * inUserData,
AudioQueueRef inAQ,
AudioQueueBufferRef inBuffer,
const AudioTimeStamp * inStartTime,
UInt32 inNumberPacketDescriptions,
const AudioStreamPacketDescription* inPacketDescs)
{
AQRecorder *aqr = (__bridge AQRecorder *)inUserData;
if ( [aqr isRunning] )
{
if ( inNumberPacketDescriptions > 0 )
{
AudioFileWritePackets(aqr->mAudioFile, FALSE, inBuffer->mAudioDataByteSize, inPacketDescs, aqr->mCurrentPacket, &inNumberPacketDescriptions, inBuffer->mAudioData);
aqr->mCurrentPacket += inNumberPacketDescriptions;
[aqr processSamplesForJS:inBuffer->mAudioDataBytesCapacity audioData:inBuffer->mAudioData];
}
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL);
}
}
- (void)debugDataFormat
{
NSLog(@"format=%i, sampleRate=%f, channels=%i, flags=%i, BPC=%i, BPF=%i", mDataFormat.mFormatID, mDataFormat.mSampleRate, (unsigned int)mDataFormat.mChannelsPerFrame, mDataFormat.mFormatFlags, mDataFormat.mBitsPerChannel, mDataFormat.mBytesPerFrame);
}
- (void)setupAudioFormat
{
memset(&mDataFormat, 0, sizeof(mDataFormat));
mDataFormat.mSampleRate = 44100.;
mDataFormat.mChannelsPerFrame = 1;
mDataFormat.mFormatID = kAudioFormatLinearPCM;
mDataFormat.mFormatFlags = kLinearPCMFormatFlagIsFloat | kLinearPCMFormatFlagIsPacked;
int sampleSize = sizeof(AUDIO_DATA_TYPE_FORMAT);
mDataFormat.mBitsPerChannel = 32;
mDataFormat.mBytesPerPacket = mDataFormat.mBytesPerFrame = (mDataFormat.mBitsPerChannel / 8) * mDataFormat.mChannelsPerFrame;
mDataFormat.mFramesPerPacket = 1;
mDataFormat.mReserved = 0;
[self debugDataFormat];
}
- (void)startRecording/
{
[self setupAudioFormat];
mCurrentPacket = 0;
NSString *recordFile = [NSTemporaryDirectory() stringByAppendingPathComponent: @"AudioFile.wav"];
CFURLRef url = CFURLCreateWithString(kCFAllocatorDefault, (CFStringRef)recordFile, NULL);;
OSStatus *stat =
AudioFileCreateWithURL(url, kAudioFileWAVEType, &mDataFormat, kAudioFileFlags_EraseFile, &mAudioFile);
NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:stat userInfo:nil];
NSLog(@"AudioFileCreateWithURL OSStatus :: %@", error);
CFRelease(url);
bufferByteSize = 896 * mDataFormat.mBytesPerFrame;
AudioQueueNewInput(&mDataFormat, AudioQueueCallback, (__bridge void *)(self), NULL, NULL, 0, &mQueue);
for ( int i = 0; i < NUM_BUFFERS; i++ )
{
AudioQueueAllocateBuffer(mQueue, bufferByteSize, &mBuffers[i]);
AudioQueueEnqueueBuffer(mQueue, mBuffers[i], 0, NULL);
}
mIsRunning = true;
AudioQueueStart(mQueue, NULL);
}
- (void)stopRecording
{
mIsRunning = false;
AudioQueueStop(mQueue, false);
AudioQueueDispose(mQueue, false);
AudioFileClose(mAudioFile);
}
- (void)processSamplesForJS:(UInt32)audioDataBytesCapacity audioData:(void *)audioData
{
int sampleCount = audioDataBytesCapacity / sizeof(AUDIO_DATA_TYPE_FORMAT);
AUDIO_DATA_TYPE_FORMAT *samples = (AUDIO_DATA_TYPE_FORMAT*)audioData;
NSMutableArray *audioDataBuffer = [[NSMutableArray alloc] initWithCapacity:JS_AUDIO_DATA_SIZE];
// FFT stuff taken mostly from Apples aurioTouch example
const Float32 kAdjust0DB = 1.5849e-13;
int bufferFrames = sampleCount;
int bufferlog2 = round(log2(bufferFrames));
float fftNormFactor = (1.0/(2*bufferFrames));
FFTSetup fftSetup = vDSP_create_fftsetup(bufferlog2, kFFTRadix2);
Float32 *outReal = (Float32*) malloc((bufferFrames / 2)*sizeof(Float32));
Float32 *outImaginary = (Float32*) malloc((bufferFrames / 2)*sizeof(Float32));
COMPLEX_SPLIT mDspSplitComplex = { .realp = outReal, .imagp = outImaginary };
Float32 *outFFTData = (Float32*) malloc((bufferFrames / 2)*sizeof(Float32));
//Generate a split complex vector from the real data
vDSP_ctoz((COMPLEX *)samples, 2, &mDspSplitComplex, 1, bufferFrames / 2);
//Take the fft and scale appropriately
vDSP_fft_zrip(fftSetup, &mDspSplitComplex, 1, bufferlog2, kFFTDirection_Forward);
vDSP_vsmul(mDspSplitComplex.realp, 1, &fftNormFactor, mDspSplitComplex.realp, 1, bufferFrames / 2);
vDSP_vsmul(mDspSplitComplex.imagp, 1, &fftNormFactor, mDspSplitComplex.imagp, 1, bufferFrames / 2);
//Zero out the nyquist value
mDspSplitComplex.imagp[0] = 0.0;
//Convert the fft data to dB
vDSP_zvmags(&mDspSplitComplex, 1, outFFTData, 1, bufferFrames / 2);
//In order to avoid taking log10 of zero, an adjusting factor is added in to make the minimum value equal -128dB
vDSP_vsadd(outFFTData, 1, &kAdjust0DB, outFFTData, 1, bufferFrames / 2);
Float32 one = 1;
vDSP_vdbcon(outFFTData, 1, &one, outFFTData, 1, bufferFrames / 2, 0);
// Average out FFT dB values
int grpSize = (bufferFrames / 2) / 32;
int c = 1;
Float32 avg = 0;
int d = 1;
for ( int i = 1; i < bufferFrames / 2; i++ )
{
if ( outFFTData[ i ] != outFFTData[ i ] || outFFTData[ i ] == INFINITY )
{ // NAN / INFINITE check
c++;
}
else
{
avg += outFFTData[ i ];
d++;
//NSLog(@"db = %f, avg = %f", outFFTData[ i ], avg);
if ( ++c >= grpSize )
{
uint8_t u = (uint8_t)((avg / d) + 128); //dB values seem to range from -128 to 0.
NSLog(@"%i = %i (%f)", i, u, avg);
[audioDataBuffer addObject:[NSNumber numberWithUnsignedInt:u]];
avg = 0;
c = 0;
d = 1;
}
}
}
[[JSMonitor shared] passAudioDataToJavascriptBridge:audioDataBuffer];
}
- (Boolean)isRunning
{
return mIsRunning;
}
@end
Audio playback and recording contrller classes Audio.h
#ifndef Audio_h
#define Audio_h
#import <AVFoundation/AVFoundation.h>
#import "AQRecorder.h"
@interface Audio : NSObject <AVAudioPlayerDelegate> {
AQRecorder* recorder;
AVAudioPlayer* player;
bool mIsSetup;
bool mIsRecording;
bool mIsPlaying;
}
- (void)setupAudio;
- (void)startRecording;
- (void)stopRecording;
- (void)startPlaying;
- (void)stopPlaying;
- (Boolean)isRecording;
- (Boolean)isPlaying;
- (NSString *) getAudioDataBase64String;
@end
#endif
Audio.m
#import "Audio.h"
#import <AudioToolbox/AudioToolbox.h>
#import "JSMonitor.h"
@implementation Audio
- (void)setupAudio
{
NSLog(@"Audio->setupAudio");
AVAudioSession *session = [AVAudioSession sharedInstance];
NSError * error;
[session setCategory:AVAudioSessionCategoryPlayAndRecord error:&error];
[session setActive:YES error:nil];
recorder = [[AQRecorder alloc] init];
mIsSetup = YES;
}
- (void)startRecording
{
NSLog(@"Audio->startRecording");
if ( !mIsSetup )
{
[self setupAudio];
}
if ( mIsRecording ) {
return;
}
if ( [recorder isRunning] == NO )
{
[recorder startRecording];
}
mIsRecording = [recorder isRunning];
}
- (void)stopRecording
{
NSLog(@"Audio->stopRecording");
[recorder stopRecording];
mIsRecording = [recorder isRunning];
[[JSMonitor shared] sendAudioInputStoppedEvent];
}
- (void)startPlaying
{
if ( mIsPlaying )
{
return;
}
mIsPlaying = YES;
NSLog(@"Audio->startPlaying");
NSError* error = nil;
NSString *recordFile = [NSTemporaryDirectory() stringByAppendingPathComponent: @"AudioFile.wav"];
player = [[AVAudioPlayer alloc] initWithContentsOfURL:[NSURL fileURLWithPath:recordFile] error:&error];
if ( error )
{
NSLog(@"AVAudioPlayer failed :: %@", error);
}
player.delegate = self;
[player play];
}
- (void)stopPlaying
{
NSLog(@"Audio->stopPlaying");
[player stop];
mIsPlaying = NO;
[[JSMonitor shared] sendAudioPlaybackCompleteEvent];
}
- (NSString *) getAudioDataBase64String
{
NSString *recordFile = [NSTemporaryDirectory() stringByAppendingPathComponent: @"AudioFile.wav"];
NSError* error = nil;
NSData *fileData = [NSData dataWithContentsOfFile:recordFile options: 0 error: &error];
if ( fileData == nil )
{
NSLog(@"Failed to read file, error %@", error);
return @"DATAENCODINGFAILED";
}
else
{
return [fileData base64EncodedStringWithOptions:0];
}
}
- (Boolean)isRecording { return mIsRecording; }
- (Boolean)isPlaying { return mIsPlaying; }
- (void)audioPlayerDidFinishPlaying:(AVAudioPlayer *)player successfully:(BOOL)flag
{
NSLog(@"Audio->audioPlayerDidFinishPlaying: %i", flag);
mIsPlaying = NO;
[[JSMonitor shared] sendAudioPlaybackCompleteEvent];
}
- (void)audioPlayerDecodeErrorDidOccur:(AVAudioPlayer *)player error:(NSError *)error
{
NSLog(@"Audio->audioPlayerDecodeErrorDidOccur: %@", error.localizedFailureReason);
mIsPlaying = NO;
[[JSMonitor shared] sendAudioPlaybackCompleteEvent];
}
@end
The JSMonitor class is a bridge between the UIWebView javascriptcore and the native code. I'm not including it because it doesn't do anything for audio other than pass data / calls between these classes and JSCore.
EDIT
The format of the audio has changed to LinearPCM Float 32bit. Instead of sending the audio data it is sent through an FFT function and the dB values are averaged and sent instead.