audio latency issues

Asked 8/2, 2018 at 14:9 Answered 28/9, 2018 at 17:13

In the application which I want to create, I face some technical obstacles. I have two music tracks in the application. For example, a user imports the music background as a first track. The second path is a voice recorded by the user to the rhythm of the first track played by the speaker device (or headphones). At this moment we face latency. After recording and playing back in the app, the user hears the loss of synchronisation between tracks, which occurs because of the microphone and speaker latencies.

Firstly, I try to detect the delay by filtering the input sound. I use android’s AudioRecord class, and the method read(). This method fills my short array with audio data. I found that the initial values of this array are zeros so I decided to cut them out before I will start to write them into the output stream. So I consider those zeros as a „warmup” latency of the microphone. Is this approach correct? This operation gives some results, but it doesn’t resolve the problem, and at this stage, I’m far away from that.

But the worse case is with the delay between starting the speakers and playing the music. This delay I cannot filter or detect. I tried to create some calibration feature which counts the delay. I play a „beep” sound through the speakers, and when I start to play it, I also begin to measure time. Then, I start recording and listen for this sound being detected by the microphone. When I recognise this sound in the app, I stop measuring time. I repeat this process several times, and the final value is the average from those results. That is how I try to measure the latency of the device. Now, when I have this value, I can simply shift the second track backwards to achieve synchronisation of both records (I will lose some initial milliseconds of the recording, but I skip this case, for now, there are some possibilities to fix it). I thought that this approach would resolve the problem, but it turned out this is not as simple as I thought. I found two issues here: 1. Delay while playing two tracks simultaneously 2. Random in device audio latency.

The first: I play two tracks using AudioTrack class and I run method play() like this:

val firstTrack = //creating a track
val secondTrack = //creating a track

firstTrack.play()
secondTrack.play()

This code causes delays at the stage of playing tracks. Now, I don’t even have to think about latency while recording; I cannot play two tracks simultaneously without delays. I tested this with some external audio file (not recorded in my app) - I’m starting the same audio file using the code above, and I can see a delay. I also tried it with MediaPlayer class, and I have the same results. In this case, I even try to play tracks when callback OnPreparedListener invoke:

val firstTrack = //AudioPlayer
val secondTrack = //AudioPlayer

second.setOnPreparedListener {
  first.start()
  second.start()
}

And it doesn’t help. I know that there is one more class provided by Android called SoundPool. According to the documentation, it can be better with playing tracks simultaneously, but I can’t use it because it supports only small audio files and that can't limit me. How can I resolve this problem? How can I start playing two tracks precisely at the same time?

The second: Audio latency is not deterministic - sometimes it is smaller, and sometimes it’s huge, and it’s out of my hands. So measuring device latency can help but again - it cannot resolve the problem.

To sum up: is there any solution, which can give me exact latency per device (or app session?) or other triggers which detect actual delay, to provide the best synchronisation while playback two tracks at the same time?

Thank you in advance!

Marna answered 8/2, 2018 at 14:9 Comment(1)

My app is similar to yours: one track is the music the other is the singing recorded by the user on that track. When faced with the same problem the only reliable solution I was able to find is to provide a slider which the user could set to change the delay among the two playbacks. This delay is usually constant with the same device, but could change if the output method varies (e.g. Bluetooth headphones). So you should be prepared to detect these situations and store different delay values. – Flamboyant 13/2, 2018 at 15:43

Synchronising audio for karaoke apps is tough. The main issue you seem to be facing is variable latency in the output stream.

This is almost certainly caused by "warm up" latency: the time it takes from hitting "play" on your backing track to the first frame of audio data being rendered by the audio device (e.g. headphones). This can have large variance and is difficult to measure.

The first (and easiest) thing to try is to use MODE_STREAM when constructing your AudioTrack and prime it with bufferSizeInBytes of data prior to calling play (more here). This should result in lower, more consistent "warm up" latency.

A better way is to use the Android NDK to have a continuously running audio stream which is just outputting silence until the moment you hit play, then start sending audio frames immediately. The only latency you have here is the continuous output latency.

If you decide to go down this route I recommend taking a look at the Oboe library (full disclosure: I am one of the authors).

To answer one of your specific questions...

Is there a way to calculate the latency of the audio output stream programatically?

Yes. The easiest way to explain this is with a code sample (this is C++ for the AAudio API but the principle is the same using Java AudioTrack):

// Get the index and time that a known audio frame was presented for playing
int64_t existingFrameIndex;
int64_t existingFramePresentationTime;
AAudioStream_getTimestamp(stream, CLOCK_MONOTONIC, &existingFrameIndex, &existingFramePresentationTime);

// Get the write index for the next audio frame
int64_t writeIndex = AAudioStream_getFramesWritten(stream);

// Calculate the number of frames between our known frame and the write index
int64_t frameIndexDelta = writeIndex - existingFrameIndex;

// Calculate the time which the next frame will be presented
int64_t frameTimeDelta = (frameIndexDelta * NANOS_PER_SECOND) / sampleRate_;
int64_t nextFramePresentationTime = existingFramePresentationTime + frameTimeDelta;

// Assume that the next frame will be written into the stream at the current time
int64_t nextFrameWriteTime = get_time_nanoseconds(CLOCK_MONOTONIC);

// Calculate the latency
*latencyMillis = (double) (nextFramePresentationTime - nextFrameWriteTime) / NANOS_PER_MILLISECOND;

A caveat: This method relies on accurate timestamps being reported by the audio hardware. I know this works on Google Pixel devices but have heard reports that it isn't so accurate on other devices so YMMV.

Petrick answered 13/2, 2018 at 18:16 Comment(4)

Thank you for your answer. I have considered using Oboe library and also AAudio API for better performance but the problem with this solution is that AAudio is introduced in Android 8.1. I would like to find a solution for more versions, let's say from API 21. These latency calculations look very good but I think it doesn't resolve the issue on every device. But this is very interesting, thank you. – Marna 15/2, 2018 at 9:18

Oboe is backwards compatible to API 16. Latency calculation via AudioTrack.getTimestamp is available (not through Oboe) from API 23 onwards. – Petrick 19/2, 2018 at 17:43

@Petrick even with oboe I am unable to sync the recorded audio with background track properly. Would you care to spare some time to look into what am I doing wrong ? – Moonscape 23/10, 2020 at 20:26

I'm unable to provide 1:1 support but if you post a question here on SO tagged with oboe I'll try to take a look. – Petrick 27/10, 2020 at 10:36

Following the answer of donturner, here's a Java version (that also uses other methods depending on the SDK version)

/** The audio latency has not been estimated yet */
private static long AUDIO_LATENCY_NOT_ESTIMATED = Long.MIN_VALUE+1;

/** The audio latency default value if we cannot estimate it */
private static long DEFAULT_AUDIO_LATENCY = 100L * 1000L * 1000L; // 100ms

/**
 * Estimate the audio latency
 *
 * Not accurate at all, depends on SDK version, etc. But that's the best
 * we can do.
 */

private static void estimateAudioLatency(AudioTrack track, long audioFramesWritten) {

    long estimatedAudioLatency = AUDIO_LATENCY_NOT_ESTIMATED;

    // First method. SDK >= 19.
    if (Build.VERSION.SDK_INT >= 19 && track != null) {

        AudioTimestamp audioTimestamp = new AudioTimestamp();
        if (track.getTimestamp(audioTimestamp)) {

            // Calculate the number of frames between our known frame and the write index
            long frameIndexDelta = audioFramesWritten - audioTimestamp.framePosition;

            // Calculate the time which the next frame will be presented
            long frameTimeDelta = _framesToNanoSeconds(frameIndexDelta);
            long nextFramePresentationTime = audioTimestamp.nanoTime + frameTimeDelta;

            // Assume that the next frame will be written at the current time
            long nextFrameWriteTime = System.nanoTime();

            // Calculate the latency
            estimatedAudioLatency = nextFramePresentationTime - nextFrameWriteTime;

        }
    }

    // Second method. SDK >= 18.
    if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED && Build.VERSION.SDK_INT >= 18) {
        Method getLatencyMethod;
        try {
            getLatencyMethod = AudioTrack.class.getMethod("getLatency", (Class<?>[]) null);
            estimatedAudioLatency = (Integer) getLatencyMethod.invoke(track, (Object[]) null) * 1000000L;
        } catch (Exception ignored) {}
    }

    // If no method has successfully gave us a value, let's try a third method
    if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
        AudioManager audioManager = (AudioManager) CRT.getInstance().getSystemService(Context.AUDIO_SERVICE);
        try {
            Method getOutputLatencyMethod = audioManager.getClass().getMethod("getOutputLatency", int.class);
            estimatedAudioLatency = (Integer) getOutputLatencyMethod.invoke(audioManager, AudioManager.STREAM_MUSIC) * 1000000L;
        } catch (Exception ignored) {}
    }

    // No method gave us a value. Let's use a default value. Better than nothing.
    if (estimatedAudioLatency == AUDIO_LATENCY_NOT_ESTIMATED) {
        estimatedAudioLatency = DEFAULT_AUDIO_LATENCY;
    }

    return estimatedAudioLatency
}

private static long _framesToNanoSeconds(long frames) {
    return frames * 1000000000L / SAMPLE_RATE;
}

Sausauce answered 28/9, 2018 at 17:13 Comment(5)

Thanks for sharing this! I'd like to add a 'calibration' feature when our app starts up, would I be able to use a blank track or generate a track? If you're curious, here's the app I'm working on: github.com/hillelcoren/mudeo – Jeannettajeannette 26/7, 2019 at 16:16

Not sure to understand the question. Why do you oppose blank track and generated track? You can generate a blank track with only zeros in it yes :) – Lawrenson 27/7, 2019 at 17:4

Sorry, another question... With a blank audio file what would I set for audioFramesWritten. You can see the full code here: github.com/hillelcoren/mudeo/blob/master/android/app/src/main/… Note: I set the sample rate to 1600 as SAMPLE_RATE isn't defined. Thank you again for your help, your code is exactly what we were looking for. – Jeannettajeannette 27/7, 2019 at 18:55

I didn't mention a blank audio file :) This code is for an AudioTrack in which you write the bytes yourself (from one or several audio sources/files). So 1/ just write zeros in the byte stream, no need for a file 2/ audioFramesWritten is the number of audio frames that you already wrote to the AudioTrack that is playing and for which you estimate the latency. Just keep track of it. If you're not clear with the terminology: a frame is equal to a sample if you're in mono, and to 2 samples if you're in stereo. A sample can be a short (= 2 bytes) or a float depending on how you set the AudioTrack. – Lawrenson 28/7, 2019 at 18:21

Oh, by the way, your AudioTrack must actually be playing for this to work, and will give good results after a few seconds only. – Lawrenson 28/7, 2019 at 18:24

The android MediaPlayer class is notoriously slow to begin audio playback, I experienced an issue in an app I was creating where there was a greater than one second delay to begin playing an audio clip. I resolved it by switching to ExoPlayer which resulted in the playback starting within 100ms. I've also read that ffmpeg has even faster start audio startup time than ExoPlayer but I haven't used it so I can't make any promises.

Gaslight answered 15/2, 2018 at 20:34 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags