I'm implementing an application which records and analyzes audio in real time (or at least as close to real time as possible), using the JDK Version 8 Update 201. While performing a test which simulates typical use cases of the application, I noticed that after several hours of recording audio continuously, a sudden delay of somewhere between one and two seconds was introduced. Up until this point there was no noticeable delay. It was only after this critical point of recording for several hours when this delay started to occur.
What I've tried so far
To check if my code for timing the recording of the audio samples is wrong, I commented out everything related to timing. This left me essentially with this update loop which fetches audio samples as soon as they are ready (Note: Kotlin code):
while (!isInterrupted) {
val audioData = read(sampleSize, false)
listener.audioFrameCaptured(audioData)
}
This is my read method:
fun read(samples: Int, buffered: Boolean = true): AudioData {
//Allocate a byte array in which the read audio samples will be stored.
val bytesToRead = samples * format.frameSize
val data = ByteArray(bytesToRead)
//Calculate the maximum amount of bytes to read during each iteration.
val bufferSize = (line.bufferSize / BUFFER_SIZE_DIVIDEND / format.frameSize).roundToInt() * format.frameSize
val maxBytesPerCycle = if (buffered) bufferSize else bytesToRead
//Read the audio data in one or multiple iterations.
var bytesRead = 0
while (bytesRead < bytesToRead) {
bytesRead += (line as TargetDataLine).read(data, bytesRead, min(maxBytesPerCycle, bytesToRead - bytesRead))
}
return AudioData(data, format)
}
However, even without any timing from my side the problem was not resolved. Therefore, I went on to experiment a bit and let the application run using different audio formats, which lead to very confusing results (I'm going to use a PCM signed 16 bit stereo audio format with little endian and a sample rate of 44100.0 Hz as default, unless specified otherwise):
- The critical amount of time that has to pass before the delay appears seems to be different depending on the machine used. On my Windows 10 desktop PC it is somewhere between 6.5 and 7 hours. On my laptop (also using Windows 10) however, it is somewhere between 4 and 5 hours for the same audio format.
- The amount of audio channels used seems to have an effect. If I change the amount of channels from stereo to mono, the time before the delay starts to appear is doubled to somewhere between 13 and 13.5 hours on my desktop.
- Decreasing the sample size from 16 bits to 8 bits also results in a doubling of the time before the delay starts to appear. Somewhere between 13 and 13.5 hours on my desktop.
- Changing the byte order from little endian to big endian has no effect.
- Switching from stereomix to a physical microphone has no effect either.
- I tried opening the line using different buffer sizes (1024, 2048 and 3072 sample frames) as well as its default buffer size. This also didn't change anything.
- Flushing the TargetDataLine after the delay has started to occur results in all bytes being zero for approximately one to two seconds. After this I get non-zero values again. The delay, however, is still there. If I flush the line before the critical point, I don't get those zero-bytes.
- Stopping and restarting the TargetDataLine after the delay appeared also does not change anything.
- Closing and reopening the TargetDataLine, however, does get rid of the delay until it reappears after several hours from there on.
- Automatically flushing the TargetDataLines internal buffer every ten minutes does not help to resolve the issue. Therefore, a buffer overflow in the internal buffer does not seem to be the cause.
- Using a parallel garbage collector to avoid application freezes also does not help.
- The used sample rate seems to be important. If I double the sample rate to 88200 Hertz, the delay starts occurring somewhere between 3 and 3.5 hours of runtime.
- If I let it run under Linux using my "default" audio format, it still runs fine after about 9 hours of runtime.
Conclusions that I've drawn:
These results let me come to the conclusion that the time for which I can record audio before this issue starts to happen is dependent on the machine on which the application is run and dependent on the byte rate (i.e. frame size and sample rate) of the audio format. This seems to hold true (although I can't completely confirm this as of now) because if I combine the changes made in 2 and 3, I would assume that I can record audio samples for four times as long (which would be somewhere between 26 and 27 hours) as when using my "default" audio format before the delay starts to appear. As I didn't find the time to let the application run for this long yet, I can only tell that it did run fine for about 15 hours before I had to stop it due to time constraints on my side. So, this hypothesis is still to be confirmed or denied.
According to the result of bullet point 13, it seems like the whole issue only appears when using Windows. Therefore, I think that it might be a bug in the platform specific parts of the javax.sound.sampled API.
Even though I think I might have found a way to change when this issue starts to happen, I'm not satisfied with the result. I could periodically close and reopen the line to avoid the problem from starting to appear at all. However, doing this would result in some arbitrary small amount of time where I wouldn't be able to capture audio samples. Furthermore, the Javadoc states that some lines can't be reopened at all after being closed. Therefore, this is not a good solution in my case.
Ideally, this whole issue shouldn't be happening at all. Is there something I am completely missing or am I experiencing limitations of what is possible with the javax.sound.sampled API? How can I get rid of this issue at all?
Edit: By suggestion of Xtreme Biker and gidds I created a small example application. You can find it inside this Github repository.