iPhone: Mix two audio files programmatically?

Asked 26/12, 2011 at 19:55 Answered 11/4, 2013 at 9:51

I want to have two audio files and mix and play it programmatically. When I am playing the first audio file, after some time(dynamic time) I need to add the second small audio file with the first audio file when somewhere middle of the first audio file is playing, then finally I need to save as one audio file on the device. It should play the audio file with the mixer audio I included the second one.

I have gone through many forums, but couldn't get the clue exactly how to achieve this?

Could someone please clarify my below doubts?

In this case, what audio file/format I should use? Can I use .avi files?
How to add the second audio after the dynamic time set onto the first audio file programmatically? For ex: If the first audio total time is 2 mins, I might need to mix the second audio file (3 seconds audio) somewhere in 1 min or 1.5 mins or 55 seconds of the first file. Its dynamic.
How to save the final output audio file on the device? If I save the audio file programmatically somewhere, can I play back again?

I don't know how to achieve this. Please suggest your thoughts!

Spenser answered 26/12, 2011 at 19:55 Comment(2)

no you can't save it as .avi because an avi is only a container (and can contain video as well). I'm not sure what you have to use on the iOS device but I guess you will have to write a wav document (in other words pure audio peaks/waves). For combining those you need an advanced knowledge of working with audio which I doesn't have. And so I'm not able to say something informative about that. – Overdraft 30/12, 2011 at 12:5

developer.apple.com/library/ios/#codinghowtos/AudioAndVideo/… – Cunaxa 30/12, 2011 at 18:28

Open each audio file
Read the header info
Get raw uncompressed audio into memory as an array of ints for each file
Starting at the point in file 1's array where you want to mix in file2, loop through, adding file2's int value to file1's, being sure to 'clip' any values above or below the max (this is how you mix audio ... yes, it's that simple). If file2 is longer, you'll have to make the first array long enough to hold the remainder of file2 completely.
Write new header info and then the audio from the array to which you added file2.
If there is compression involved or the files won't fit in memory, you may have to implement a more complex buffering scheme.

Jessalyn answered 3/1, 2012 at 21:34 Comment(5)

Simply adding the two streams together and clipping at extreme values doesn't sound (no pun intended) like it would result in very useful output. The two "inputs" should be scaled appropriately such that no truncation should need to happen. – Barhorst 3/1, 2012 at 21:41

Yep, that's essentially it. Hopefully the two files are in the same format, at the same sampling rate, and not compressed, so it's a "simple" matter of array addition (keeping in mind that there are likely two channels). A first scan over the data would reveal whether clipping would occur, and then scaling could be applied to maintain the optimal volume while avoiding clipping. – Reluctivity 3/1, 2012 at 21:42

@Sedate - You are absolutely correct! But if you think back to your garage band days with a used analog mixer, you'll remember the unfortunate truth - that is the way it is in the real world. Sounds are mixed without scaling; when the levels are too much, the resulting distortion is actually called 'clipping!' The technique Hot Licks mentions is called 'compression' (albeit a niave implementation) and for analog, it's another box to throw in the rack. Usually, though, suprisingly, the result doesn't clip. Try it in Audacity (you do you a copy installed, don't you ;-) – Jessalyn 3/1, 2012 at 22:4

*Compression from my comment to @Sedate = no relation to eliminating redundant data in a file stream - rather 'compressing' the audio (making the waves less tall) to fit into an 'envelope' (the min/max peaks the system is set to handle). – Jessalyn 3/1, 2012 at 22:7

Actually, the technique I was describing would be equivalent to simply adjusting the master level control. One could do it somewhat dynamically, and then it would be "compression", but that's unnecessary in this scenario. You are right, though, that probably the result wouldn't clip, even without any adjustments. – Reluctivity 4/1, 2012 at 1:49

In this case, what audio file/format I should use? Can I use .avi files?

You can choose a compressed or non-compressed format. Common non-compressed formats include Wav and AIFF. CAF can represent compressed and non compressed data. .avi is not an option (offered by the OS).

If the files are large and storage space (on disk) is a concern, you may consider AAC format saved in a CAF (or simply .m4a). For most applications, 16 bit samples will be enough, and you can also save space, memory and cpu by saving these files at an appropriate sample rate (ref: CDs are 44.1kHz).

Since ExtAudioFile interface abstract the conversion process, you should not have to change your program to compare size and speed differences of compressed and non-compressed formats for your distribution (AAC in CAF would be fine for normal applications).

Noncompressed CD quality audio will consume about 5.3 MB per minute, per channel. So if you have 2 stereo audio files, each 3 minutes long, and a 3 minute destination buffer, your memory requirement would be around 50 MB.

Since you have 'minutes' of audio, you may need to consider avoiding loading all audio data into memory at once. In order to read, manipulate, and combine audio, you will need a non-compressed representation to work with in memory, so compression formats would not help here. As well, converting a compressed representation to pcm takes a good amount of resources; reading a compressed file, although fewer bytes, can take more (or less) time.

How to add the second audio after the dynamic time set onto the first audio file programmatically? For ex: If the first audio total time is 2 mins, I might need to mix the second audio file (3 seconds audio) somewhere in 1 min or 1.5 mins or 55 seconds of the first file. Its dynamic.

To read the files and convert them to the format you want to use, use ExtAudioFile APIs - this will convert to your destination sample format for you. Common PCM sample representations in memory include SInt32, SInt16, and float, but that can vary wildly based on the application and the hardware (beyond iOS). ExtAudioFile APIs would also convert compressed formats to PCM, if needed.

Your input audio files should have the same sample rate. If not, you will have to resample the audio, a complex process which also takes a lot of resources (if done correctly/accurately). If you need to support resampling, double the time you've allocated to completing this task (not detailing the process here).

To add the sounds, you would request PCM samples from the files, process, and write to the output file (or buffer in memory).

To determine when to add the other sounds, you will need to get the sample rates for the input files (via ExtAudioFileGetProperty). If you want to write the second sound to the destination buffer at 55s, then you would start adding the sounds at sample number SampleRate * 55, where SampleRate is the sample rate of the files you are reading.

To mix audio, you will just use this form (pseudocode):

mixed[i] = fileA[i] + fileB[i];

but you have to be sure you avoid over/underflow and other arithmetic errors. Typically, you will perform this process using some integer value, because floating point calculations can take a long time (when there are so many). For some applications, you could just shift and add with no worry of overflow - this would effectively reduce each input by one half before adding them. The amplitude of the result would be one half. If you have control over the files' content (e.g. they are all bundled as resources) then you could simply ensure no peak sample in the files exceeded one half of the full scale value (about -6dBFS). Of course, saving as float would solve this issue at the expense of introducing higher CPU, memory, and file i/o demands.

At this point, you'd have 2 files open for reading, and one open for writing, then a few small temporary buffers for processing and mixing the inputs before writing to the output file. You should perform these requests in blocks for efficiency (e.g. read 1024 samples from each file, process the samples, write 1024 samples). The APIs don't guarantee much regarding caching and buffering for efficiency.

How to save the final output audio file on the device? If I save the audio file programmatically somewhere, can I play back again?

ExtAudioFile APIs would work for your read and writing needs. Yes, you can read/play it later.

Tunstall answered 3/1, 2012 at 22:24 Comment(0)

Hello You can do this by using av foundation

- (BOOL) combineVoices1
{
    NSError *error = nil;
    BOOL ok = NO;


    NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory,    NSUserDomainMask, YES);
    NSString *documentsDirectory = [paths objectAtIndex:0];


    CMTime nextClipStartTime = kCMTimeZero;
    //Create AVMutableComposition Object.This object will hold our multiple AVMutableCompositionTrack.
    AVMutableComposition *composition = [[AVMutableComposition alloc] init];

    AVMutableCompositionTrack *compositionAudioTrack = [composition addMutableTrackWithMediaType:AVMediaTypeAudio preferredTrackID:kCMPersistentTrackID_Invalid];
    [compositionAudioTrack setPreferredVolume:0.8];
    NSString *soundOne  =[[NSBundle mainBundle]pathForResource:@"test1" ofType:@"caf"];
    NSURL *url = [NSURL fileURLWithPath:soundOne];
    AVAsset *avAsset = [AVURLAsset URLAssetWithURL:url options:nil];
    NSArray *tracks = [avAsset tracksWithMediaType:AVMediaTypeAudio];
    AVAssetTrack *clipAudioTrack = [[avAsset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
    [compositionAudioTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, avAsset.duration) ofTrack:clipAudioTrack atTime:kCMTimeZero error:nil];

    AVMutableCompositionTrack *compositionAudioTrack1 = [composition addMutableTrackWithMediaType:AVMediaTypeAudio preferredTrackID:kCMPersistentTrackID_Invalid];
    [compositionAudioTrack setPreferredVolume:0.3];
    NSString *soundOne1  =[[NSBundle mainBundle]pathForResource:@"test" ofType:@"caf"];
    NSURL *url1 = [NSURL fileURLWithPath:soundOne1];
    AVAsset *avAsset1 = [AVURLAsset URLAssetWithURL:url1 options:nil];
    NSArray *tracks1 = [avAsset1 tracksWithMediaType:AVMediaTypeAudio];
    AVAssetTrack *clipAudioTrack1 = [[avAsset1 tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
    [compositionAudioTrack1 insertTimeRange:CMTimeRangeMake(kCMTimeZero, avAsset.duration) ofTrack:clipAudioTrack1 atTime:kCMTimeZero error:nil];


    AVMutableCompositionTrack *compositionAudioTrack2 = [composition addMutableTrackWithMediaType:AVMediaTypeAudio preferredTrackID:kCMPersistentTrackID_Invalid];
    [compositionAudioTrack2 setPreferredVolume:1.0];
    NSString *soundOne2  =[[NSBundle mainBundle]pathForResource:@"song" ofType:@"caf"];
    NSURL *url2 = [NSURL fileURLWithPath:soundOne2];
    AVAsset *avAsset2 = [AVURLAsset URLAssetWithURL:url2 options:nil];
    NSArray *tracks2 = [avAsset2 tracksWithMediaType:AVMediaTypeAudio];
    AVAssetTrack *clipAudioTrack2 = [[avAsset2 tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
    [compositionAudioTrack1 insertTimeRange:CMTimeRangeMake(kCMTimeZero, avAsset2.duration) ofTrack:clipAudioTrack2 atTime:kCMTimeZero error:nil];



    AVAssetExportSession *exportSession = [AVAssetExportSession
                                           exportSessionWithAsset:composition
                                           presetName:AVAssetExportPresetAppleM4A];
    if (nil == exportSession) return NO;

    NSString *soundOneNew = [documentsDirectory stringByAppendingPathComponent:@"combined10.m4a"];
    //NSLog(@"Output file path - %@",soundOneNew);

    // configure export session  output with all our parameters
    exportSession.outputURL = [NSURL fileURLWithPath:soundOneNew]; // output path
    exportSession.outputFileType = AVFileTypeAppleM4A; // output file type

    // perform the export
    [exportSession exportAsynchronouslyWithCompletionHandler:^{

        if (AVAssetExportSessionStatusCompleted == exportSession.status) {
            NSLog(@"AVAssetExportSessionStatusCompleted");
        } else if (AVAssetExportSessionStatusFailed == exportSession.status) {
            // a failure may happen because of an event out of your control
            // for example, an interruption like a phone call comming in
            // make sure and handle this case appropriately
            NSLog(@"AVAssetExportSessionStatusFailed");
        } else {
            NSLog(@"Export Session Status: %d", exportSession.status);
        }
    }];


    return YES;


}

Intromission answered 11/4, 2013 at 9:51 Comment(0)

If you are going to play multiple sounds at once, definitely use the *.caf format. Apple recommends it for playing multiple sounds at once. In terms of mixing them programmatically, I am assuming you just want them to play at the same time. While one sound is playing, just tell the other sound to play at whatever time you would like. To set a specific time, use NSTimer (NSTimer Class Reference) and create a method to have the sound play when the timer fires.

Hixon answered 30/12, 2011 at 21:11 Comment(0)

Recommended topics

Hot tags