In this case, what audio file/format I should use? Can I use .avi files?
You can choose a compressed or non-compressed format. Common non-compressed formats include Wav and AIFF. CAF can represent compressed and non compressed data. .avi is not an option (offered by the OS).
If the files are large and storage space (on disk) is a concern, you may consider AAC format saved in a CAF (or simply .m4a). For most applications, 16 bit samples will be enough, and you can also save space, memory and cpu by saving these files at an appropriate sample rate (ref: CDs are 44.1kHz).
Since ExtAudioFile interface abstract the conversion process, you should not have to change your program to compare size and speed differences of compressed and non-compressed formats for your distribution (AAC in CAF would be fine for normal applications).
Noncompressed CD quality audio will consume about 5.3 MB per minute, per channel. So if you have 2 stereo audio files, each 3 minutes long, and a 3 minute destination buffer, your memory requirement would be around 50 MB.
Since you have 'minutes' of audio, you may need to consider avoiding loading all audio data into memory at once. In order to read, manipulate, and combine audio, you will need a non-compressed representation to work with in memory, so compression formats would not help here. As well, converting a compressed representation to pcm takes a good amount of resources; reading a compressed file, although fewer bytes, can take more (or less) time.
How to add the second audio after the dynamic time set onto the first audio file programmatically? For ex: If the first audio total time is 2 mins, I might need to mix the second audio file (3 seconds audio) somewhere in 1 min or 1.5 mins or 55 seconds of the first file. Its dynamic.
To read the files and convert them to the format you want to use, use ExtAudioFile APIs - this will convert to your destination sample format for you. Common PCM sample representations in memory include SInt32
, SInt16
, and float
, but that can vary wildly based on the application and the hardware (beyond iOS). ExtAudioFile APIs would also convert compressed formats to PCM, if needed.
Your input audio files should have the same sample rate. If not, you will have to resample the audio, a complex process which also takes a lot of resources (if done correctly/accurately). If you need to support resampling, double the time you've allocated to completing this task (not detailing the process here).
To add the sounds, you would request PCM samples from the files, process, and write to the output file (or buffer in memory).
To determine when to add the other sounds, you will need to get the sample rates for the input files (via ExtAudioFileGetProperty). If you want to write the second sound to the destination buffer at 55s, then you would start adding the sounds at sample number SampleRate * 55
, where SampleRate
is the sample rate of the files you are reading.
To mix audio, you will just use this form (pseudocode):
mixed[i] = fileA[i] + fileB[i];
but you have to be sure you avoid over/underflow and other arithmetic errors. Typically, you will perform this process using some integer value, because floating point calculations can take a long time (when there are so many). For some applications, you could just shift and add with no worry of overflow - this would effectively reduce each input by one half before adding them. The amplitude of the result would be one half. If you have control over the files' content (e.g. they are all bundled as resources) then you could simply ensure no peak sample in the files exceeded one half of the full scale value (about -6dBFS). Of course, saving as float would solve this issue at the expense of introducing higher CPU, memory, and file i/o demands.
At this point, you'd have 2 files open for reading, and one open for writing, then a few small temporary buffers for processing and mixing the inputs before writing to the output file. You should perform these requests in blocks for efficiency (e.g. read 1024 samples from each file, process the samples, write 1024 samples). The APIs don't guarantee much regarding caching and buffering for efficiency.
How to save the final output audio file on the device? If I save the audio file programmatically somewhere, can I play back again?
ExtAudioFile APIs would work for your read and writing needs. Yes, you can read/play it later.