Sampling audio in real time using Aubio without stopping recording audio AND video iPhone/iPad

Swift 2.2
Xcode 7.3
Aubio 0.4.3 (aubio-0.4.3~const.iosuniversal_framework)
iOS 9.3 Target
Test Device - iPad Air
bufferSize: 2048
numSamplesInBuffer: 1024
Sample Rate: 44100

Caveats:

I have intentionally left AVCaptureVideo code in my upcoming code example so that anyone more briefly reading my question will not forget that I trying to capture audio AND video with the same recording AvCaptureSession and sample the audio in real time
I have fully tested Aubio -> Onset, specifically with a sample.caf (Core Audio Format) sound file as well as a recording, saved to file (also a .caf) using AvAudioRecorder and it works correctly on a real device (iPad Air). A very important take away of why Aubio works in tests is that I create a URI or file based sample with new_aubio_source. In my "real" version I am attempting to sample the sound buffer without saving the audio data to file.
Possible alternative approach to use Aubio. If I could start storing AudioBuffers as a valid Core Audio Format (.caf) file, Aubio would work, not sure if sampling would be fast enough with a file based solution, but after days of research I have not figured out how to store func captureOutput(captureOutput: AVCaptureOutput, didOutputSampleBuffer sampleBuffer: CMSampleBufferRef, fromConnection connection: AVCaptureConnection) CmSampleBufferRefs to file. And that includes using NSData which never stores a valid .caf to file.
Related to previous caveat, I have not found a way to use AvFoundation super helpful objects such as AVAudioRecorder (which will store a nice .caf to file) because it depends on you stopping the recording/capture session.
If you remove all video capture code you can run this on simulator, please comment below and I will prepare a simulator version of the code if you desire aka you do not have an Apple device handy. Camera functionality must be tested on a live device.

The following code successfully starts an Audio and Video AVCaptureSession, the AVCaptureSession delegate func captureOutput(captureOutput: AVCaptureOutput, didOutputSampleBuffer sampleBuffer: CMSampleBufferRef, fromConnection connection: AVCaptureConnection) is being called for both audio and video. When a. audio CMSampleBufferRef sample is provided I tried to convert that sample to an AudioBuffer and pass to Aubio method aubio_onset_do. I am using a singleton aubio_onset COpaquePointer.

In this code I attempt to call aubio_onset_do with audio buffer data two different ways.

Method 1 - The current way of the code below is with let useTimerAndNSMutableData = false. This means that in my prepareAudioBuffer function I pass the audioBuffer.mData to sampleAudioForOnsets. This method never fails but there is also no onsets ever detected, I suspect because the sample size is not large enough.

Method 2 If let useTimerAndNSMutableData = true I call ultimately call sampleAudioForOnsets every 1 second allowing time to build NSMutableData with AudioBuffer.mDatas. With this method, I am attempting to give aubio_onset_do a large enough sample to detect onsets, using a timer and NSMutableData This method causes aubio_onset_do to crash very quickly:

(EXC_BAD_ACCESS (code=1))

import UIKit
import AVFoundation

class AvRecorderViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate, AVCaptureAudioDataOutputSampleBufferDelegate, AVAudioRecorderDelegate, AVAudioPlayerDelegate {


    var captureSession: AVCaptureSession!
    var imageView:UIImageView!
    var customLayer:CALayer!
    var prevLayer:AVCaptureVideoPreviewLayer!

    let samplingFrequency = Int32(30)
    var aubioOnset:COpaquePointer? = nil
    let pathToSoundSample = FileUtility.getPathToAudioSampleFile()
    var onsetCount = 0
    let testThres:smpl_t = 0.03
    let nsMutableData: NSMutableData = NSMutableData()
    var sampleRate:UInt32!
    var bufferSize:UInt32!
    let useTimerAndNSMutableData = false

    override func viewDidLoad() {
        super.viewDidLoad()

        if FileUtility.fileExistsAtPath(pathToSoundSample) {
            print("sample file exists")
            FileUtility.deleteFileByNsurl(NSURL(fileURLWithPath: pathToSoundSample))
        }
        setupCapture()

        if useTimerAndNSMutableData {
            //create timer for sampling audio
            NSTimer.scheduledTimerWithTimeInterval(1, target: self, selector: #selector(timerFiredPrepareForAubioOnsetSample), userInfo: nil, repeats: true)
        }
    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }

    override func viewWillTransitionToSize(size: CGSize, withTransitionCoordinator coordinator: UIViewControllerTransitionCoordinator) {
        super.viewWillTransitionToSize(size, withTransitionCoordinator: coordinator)

        coordinator.animateAlongsideTransition({ (context) -> Void in

            }, completion: { (context) -> Void in

        })
    }

    override func viewWillLayoutSubviews() {
        prevLayer.frame = self.view.bounds

        if prevLayer.connection.supportsVideoOrientation {
            prevLayer.connection.videoOrientation = MediaUtility.interfaceOrientationToVideoOrientation(UIApplication.sharedApplication().statusBarOrientation)
        }
    }

    func timerFiredPrepareForAubioOnsetSample() {
        if nsMutableData.length <= 0 {
            return
        }

        let data = UnsafeMutablePointer<smpl_t>(nsMutableData.bytes)
        sampleAudioForOnsets(data, length: UInt32(nsMutableData.length))
    }

    func setupCapture() {
        let captureDeviceVideo: AVCaptureDevice = AVCaptureDevice.defaultDeviceWithMediaType(AVMediaTypeVideo)
        let captureDeviceAudio: AVCaptureDevice = AVCaptureDevice.defaultDeviceWithMediaType(AVMediaTypeAudio)
        var captureVideoInput: AVCaptureDeviceInput
        var captureAudioInput: AVCaptureDeviceInput

        //video setup
        if captureDeviceVideo.isTorchModeSupported(.On) {
            try! captureDeviceVideo.lockForConfiguration()

            /*if captureDeviceVideo.position == AVCaptureDevicePosition.Front {
                captureDeviceVideo.position == AVCaptureDevicePosition.Back
            }*/

            //configure frame rate
            /*We specify a minimum duration for each frame (play with this settings to avoid having too many frames waiting
             in the queue because it can cause memory issues). It is similar to the inverse of the maximum framerate.
             In this example we set a min frame duration of 1/10 seconds so a maximum framerate of 10fps. We say that
             we are not able to process more than 10 frames per second.*/
            captureDeviceVideo.activeVideoMaxFrameDuration = CMTimeMake(1, samplingFrequency)
            captureDeviceVideo.activeVideoMinFrameDuration = CMTimeMake(1, samplingFrequency)
            captureDeviceVideo.unlockForConfiguration()
        }

        //try and create audio and video inputs
        do {
            try captureVideoInput = AVCaptureDeviceInput(device: captureDeviceVideo)
            try captureAudioInput = AVCaptureDeviceInput(device: captureDeviceAudio)

        } catch {
            print("cannot record")
            return
        }

        /*setup the output*/
        let captureVideoDataOutput: AVCaptureVideoDataOutput = AVCaptureVideoDataOutput()
        let captureAudioDataOutput: AVCaptureAudioDataOutput = AVCaptureAudioDataOutput()

        /*While a frame is processes in -captureVideoDataOutput:didOutputSampleBuffer:fromConnection: delegate methods no other frames are added in the queue.
         If you don't want this behaviour set the property to false */
        captureVideoDataOutput.alwaysDiscardsLateVideoFrames = true

        // Set the video output to store frame in BGRA (It is supposed to be faster)
        let videoSettings: [NSObject : AnyObject] = [kCVPixelBufferPixelFormatTypeKey:Int(kCVPixelFormatType_32BGRA)]

        captureVideoDataOutput.videoSettings = videoSettings

        /*And we create a capture session*/
        captureSession = AVCaptureSession()

        //and configure session
        captureSession.sessionPreset = AVCaptureSessionPresetHigh

        /*We add audio/video input and output to session*/
        captureSession.addInput(captureVideoInput)
        captureSession.addInput(captureAudioInput)
        captureSession.addOutput(captureVideoDataOutput)
        captureSession.addOutput(captureAudioDataOutput)

        //not sure if I need this or not, found on internet
        captureSession.commitConfiguration()


        /*We create a serial queue to handle the processing of our frames*/
        var queue: dispatch_queue_t
        queue = dispatch_queue_create("queue", DISPATCH_QUEUE_SERIAL)

        //setup delegate
        captureVideoDataOutput.setSampleBufferDelegate(self, queue: queue)
        captureAudioDataOutput.setSampleBufferDelegate(self, queue: queue)


        /*We add the Custom Layer (We need to change the orientation of the layer so that the video is displayed correctly)*/
        customLayer = CALayer()
        customLayer.frame = self.view.bounds
        customLayer.transform = CATransform3DRotate(CATransform3DIdentity, CGFloat(M_PI) / 2.0, 0, 0, 1)
        customLayer.contentsGravity = kCAGravityResizeAspectFill
        view.layer.addSublayer(self.customLayer)

        /*We add the imageView*/
        imageView = UIImageView()
        imageView.frame = CGRectMake(0, 0, 100, 100)
        view!.addSubview(self.imageView)

        /*We add the preview layer*/
        prevLayer = AVCaptureVideoPreviewLayer()
        prevLayer = AVCaptureVideoPreviewLayer(session: self.captureSession)
        prevLayer.frame = CGRectMake(100, 0, 100, 100)
        prevLayer.videoGravity = AVLayerVideoGravityResizeAspectFill
        view.layer.addSublayer(self.prevLayer)

        /*We start the capture*/
        captureSession.startRunning()

    }

    // MARK: AVCaptureSession delegates

    func captureOutput(captureOutput: AVCaptureOutput, didOutputSampleBuffer sampleBuffer: CMSampleBufferRef, fromConnection connection: AVCaptureConnection) {

        if (captureOutput is AVCaptureAudioDataOutput) {
            prepareAudioBuffer(sampleBuffer)
        }

        //not relevant to my Stack Overflow question
        /*if (captureOutput is AVCaptureVideoDataOutput) {
            displayVideo(sampleBuffer)
        }*/

    }

    func captureOutput(captureOutput: AVCaptureOutput!, didDropSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) {
        print("frame dropped")
    }

    private func sampleAudioForOnsets(data: UnsafeMutablePointer<smpl_t>, length: UInt32) {
        print("\(#function)")

        //let samples = new_fvec(512)
        var total_frames : uint_t = 0
        let out_onset = new_fvec (1)
        var read : uint_t = 0

        //singleton of aubio_onset
        if aubioOnset == nil {
            let method = ("default" as NSString).UTF8String
            aubioOnset = new_aubio_onset(UnsafeMutablePointer<Int8>(method), bufferSize, 512, UInt32(sampleRate))
            aubio_onset_set_threshold(aubioOnset!, testThres)
        }

        var sample: fvec_t = fvec_t(length: length, data: data)

        //do not need the while loop but I have left it in here because it will be quite familiar to people that have used Aubio before and may help jog their
        //memory, such as reminding people familiar with Aubio that the aubio_source_do is normally used to "seek" through a sample
        while true {
            //aubio_source_do(COpaquePointer(source), samples, &read)

            //new_aubio_onset hop_size is 512, will aubio_onset_do move through a fvec_t sample at a 512 hop without an aubio_source_do call?
            aubio_onset_do(aubioOnset!, &sample, out_onset)

            if (fvec_get_sample(out_onset, 0) != 0) {
                print(String(format: ">>> %.2f", aubio_onset_get_last_s(aubioOnset!)))
                onsetCount += 1
            }

            total_frames += read

            //always will break the first iteration, only reason for while loop is to demonstate the normal use of aubio using aubio_source_do to read
            if (read < 512) {
                break
            }
        }

        print("done, total onsetCount: \(onsetCount)")

        if onsetCount > 1 {
            print("we are getting onsets")
        }
    }

    // MARK: - Private Helpers

    private func prepareAudioBuffer(sampleBuffer: CMSampleBufferRef) {

        let numSamplesInBuffer = CMSampleBufferGetNumSamples(sampleBuffer)
        bufferSize = UInt32(CMSampleBufferGetTotalSampleSize(sampleBuffer))
        var blockBuffer:CMBlockBufferRef? = nil
        var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
        var status:OSStatus
        let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)!
        let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(formatDescription)
        sampleRate = UInt32(asbd.memory.mSampleRate)

        print("bufferSize: \(bufferSize)")
        print("numSamplesInBuffer: \(numSamplesInBuffer)")
        print("Sample Rate: \(sampleRate)")
        print("assetWriter.status: ")

        status = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
            sampleBuffer,
            nil,
            &audioBufferList,
            sizeof(audioBufferList.dynamicType), // instead of UInt(sizeof(audioBufferList.dynamicType))
            nil,
            nil,
            UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment),
            &blockBuffer
        )


        let audioBuffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))

        for audioBuffer in audioBuffers {

            if useTimerAndNSMutableData {
                //NSDATA APPEND, NSMutableData is building and will be analyzed at timer interbal
                let frame = UnsafePointer<Float32>(audioBuffer.mData)
                nsMutableData.appendBytes(frame, length: Int(audioBuffer.mDataByteSize))
            }else{
                //this never fails but there are never any onsets either, cannot tell if the audio sampling is just not long enough
                //or if the data really isn't valid data
                //smpl_t is a Float
                let data = UnsafeMutablePointer<smpl_t>(audioBuffer.mData)
                sampleAudioForOnsets(data, length: audioBuffer.mDataByteSize)
            }

        }

    }

}

Recommended topics

Hot tags