java sound on linux: how to capture from TargetDataLine quickly enough to keep up?
Asked Answered
B

1

6

I'm using the Java sound API and Java 1.7. I am having difficulty reading from a TargetDataLine quickly enough to keep up with what is being recorded when I run my application on Linux (java version "1.7.0_51", Java(TM) SE Runtime Environment (build 1.7.0_51-b13), Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode), Red Hat Enterprise Linux 5). I don't have this problem when running the same program on my Windows 7 laptop. I'm somewhat stumped.

To isolate the issue, I wrote a program that captures from a TargetDataLine for an interval of time (interactively determined) and records the amount of time spent in a blocking read of a fixed number of bytes each time, then prints these out along with mean read time, total time elapsed, and time worth of audio captured.

My test program is as follows:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.Mixer;
import javax.sound.sampled.TargetDataLine;

/**
 * This is a test of mic capture delay for given buffer and fetch settings.
 */
public class MicCaptureDelayTest {

   /**
    * the audio format used for capturing and transmitting
    */
   private static final AudioFormat format =
         new AudioFormat(8000, 16, 1, true, true);

   /**
    * This is the target data line buffer size to request, in bytes.
    */
   private static final int MIC_BUFFER_SIZE = 1000;

   /**
    * This is the number of bytes to try to fetch from the target data line at a
    * time.
    */
   private static final int MIC_FETCH_SIZE = 480;

   /**
    * Searches for available mixers on the system that have a microphone.
    * @return a list of matching mixers
    */
   private static List<Mixer.Info> findMicrophoneMixers() {
      Mixer.Info[] mixerInfos = AudioSystem.getMixerInfo();
      List<Mixer.Info> matches = new ArrayList<>();
      for (Mixer.Info mixerInfo : mixerInfos) {
         Mixer mixer = AudioSystem.getMixer(mixerInfo);
         DataLine.Info lineInfo = new DataLine.Info(TargetDataLine.class,
               format);
         boolean isSupported = mixer.isLineSupported(lineInfo);

         if (isSupported) {
            matches.add(mixerInfo);
         }
      }

      return matches;
   }

   /**
    * This is the test recording thread.
    */
   private static class MicFetcher extends Thread {

      /**
       * This is the requested recording state.
       */
      private boolean shouldRecord = false;

      /**
       * This is the current processed recording state of the thread.
       */
      private boolean isRecording = false;

      /**
       * This is the Java audio interface line microphone data is captured from.
       */
      private TargetDataLine lineFromMic;

      /**
       * Runs the test mic capture thread body.
       */
      @Override
      public void run() {

         List<Mixer.Info> matchingMixerInfo = findMicrophoneMixers();

         // Use the first matching mixer.
         Mixer mixerToUse = AudioSystem.getMixer(matchingMixerInfo.get(0));

         DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

         try {
            lineFromMic = (TargetDataLine) mixerToUse.getLine(info);
            lineFromMic.open(format, MIC_BUFFER_SIZE);
         } catch (LineUnavailableException e) {
            e.printStackTrace();
            return;
         }

         byte[] transferBuffer = new byte[MIC_FETCH_SIZE];
         List<Long> readTimesNanos = new LinkedList<>();
         int numFramesCaptured = 0;
         long startTimeNanos = 0;

         while (true) {
            boolean currentShouldRecord;
            synchronized(this) {
               currentShouldRecord = shouldRecord;
            }

            if (!isRecording && currentShouldRecord) {
               // Start recording.

               System.out.println("Starting.");
               lineFromMic.start();
               isRecording = true;
               startTimeNanos = System.nanoTime();

            } else if (isRecording && !currentShouldRecord) {
               // Stop recording.
               System.out.println("Stopping.");
               lineFromMic.stop();
               lineFromMic.flush();

               System.out.print("read times (ms): ");
               long sumReadTimesNanos = 0;
               int i = 0;
               for (Long sampleTimeNanos : readTimesNanos) {
                  if (i % 5 == 0) {
                     System.out.println();
                  }
                  System.out.printf("%.2f  ", sampleTimeNanos / 1.0e6);
                  sumReadTimesNanos += sampleTimeNanos;
                  ++i;
               }
               System.out.println();
               System.out.println(
                     "Mean read time (ms): "
                           + (sumReadTimesNanos / 1.0e6
                                 / readTimesNanos.size()));

               long stopTimeNanos = System.nanoTime();
               System.out.println("Time captured (s): "
                     + (numFramesCaptured / format.getFrameRate()));
               System.out.println("Time elapsed (s): "
                     + (stopTimeNanos - startTimeNanos) / 1.0e9);

               readTimesNanos.clear();
               numFramesCaptured = 0;
               isRecording = false;

            } else if (isRecording) {
               // Continue recording.

               long beforeTimeNanos = System.nanoTime();

               // Retrieve data from the line.  This blocks.
               int numBytesRead = lineFromMic.read(
                     transferBuffer, 0, MIC_FETCH_SIZE);
               numFramesCaptured += numBytesRead / format.getFrameSize();

               long afterTimeNanos = System.nanoTime();
               long timeElapsedNanos = afterTimeNanos - beforeTimeNanos;
               readTimesNanos.add(timeElapsedNanos);
            }
         }
      }

      /**
       * Requests to toggle the recording state of the test recording thread.
       */
      public synchronized void toggleState() {
         shouldRecord = ! shouldRecord;
      }
   }

   /**
    * Runs the test program.  Newline toggles state.
    * @param args command line args-- none needed
    * @throws IOException if thrown when trying to get console input
    */
   public static void main(String[] args) throws IOException {
      BufferedReader inputReader =
            new BufferedReader(new InputStreamReader(System.in));

      MicFetcher fetcher = new MicFetcher();
      fetcher.start();

      while (true) {
         // Toggle state for each line of input (ie, press enter to toggle).
         inputReader.readLine();
         fetcher.toggleState();
      }
   }
}

When I run this in my Linux environment, for a roughly 10-second recording, the output looks like:

Starting.

Stopping.
read times (ms): 
54.00  18.10  36.62  36.32  35.99  
18.10  18.25  54.26  18.30  35.56  
18.12  35.51  36.74  17.22  36.70  
35.29  18.33  35.60  18.23  54.72  
19.00  37.99  18.14  18.37  53.91  
18.37  35.34  36.00  18.00  36.00  
18.00  54.71  17.22  18.12  36.18  
36.64  36.08  18.00  54.34  18.26  
18.27  35.44  18.30  54.77  18.33  
18.24  36.51  35.47  36.52  18.35  
17.14  54.96  18.13  36.73  17.21  
54.95  18.28  18.37  36.54  36.72  
35.56  18.37  17.23  54.46  18.36  
35.53  18.08  36.00  36.00  17.99  
54.30  18.06  35.22  18.00  18.00  
53.93  18.32  35.63  36.64  18.16  
35.21  18.30  55.65  18.23  18.35  
35.55  36.32  35.60  18.30  36.33  
36.21  17.22  36.54  18.32  54.96  
17.19  18.36  35.62  36.67  35.25  
18.29  18.37  54.63  18.37  36.54  
18.35  53.91  18.37  17.23  36.70  
36.09  36.01  17.19  18.33  53.91  
18.37  36.56  18.36  35.53  36.58  
18.16  53.84  18.26  36.03  18.08
18.12  54.24  18.08  36.14  36.19
18.12  36.08  18.11  53.80  18.28
18.37  36.55  18.13  53.99  18.00
36.12  35.54  18.28  36.56  17.20
53.96  18.00  18.01  36.67  36.53
36.71  17.19  18.37  54.37  18.02
35.97  18.00  54.00  18.00  18.00
36.00  35.99  36.34  18.37  18.35
53.93  18.13  36.63  18.33  36.33
36.34  18.33  36.55  35.51  36.66
18.29  18.06  54.00  17.99  36.08
18.25  36.64  36.38  18.37  35.55
36.66  18.21  36.73  17.19  54.27
18.13  35.55  18.18  36.31  35.56
18.34  53.90  18.36  18.09  36.15
18.22  53.90  18.32  18.37  53.89
18.19  36.04  17.20  53.94  18.31
18.37  36.55  36.70  36.61  18.35
17.18  53.97  18.32  36.55  19.01
18.99  57.00  18.99  38.01  18.98
38.00  18.99  36.99  36.35  18.37
36.55  36.70  18.04  38.00  19.00
38.00  37.99  18.99  37.99  19.00
37.06  36.43  36.03  18.00  18.00
54.47  18.25  36.70  18.22  18.37
53.55  18.33  35.59  36.59  18.29
35.36  18.37  54.89  18.24  36.44
18.33  18.36  53.52  18.13  36.36
35.57  18.20  35.52  18.20  53.78
18.18  18.16  35.49  36.67  36.54
18.37  36.53  36.67  17.19  36.65
18.29  54.87  17.14  18.24  36.68
35.49  35.61  18.27  18.36  53.77
18.24  35.43  18.35  53.90  18.37
18.24  38.00  38.00  37.99  18.99
19.01  37.98  19.00  57.00  18.99
19.00  38.00  18.99  55.01  18.98
35.99  18.00  18.01  54.98  18.00
37.00  17.99  36.00  36.00  17.99
54.01  18.98  18.00  36.02  18.98
53.16  18.34  35.59  36.20  17.98
36.00  18.00  54.00  17.99  18.00
36.00  35.99  36.01  17.99  18.00
54.00  17.98  35.99  18.00  54.28
Mean read time (ms): 30.210176811594206
Time captured (s): 10.35
Time elapsed (s): 10.466399

The output for a similar roughly 10 second recording in my Windows environment looks like:

Starting.

Stopping.
read times (ms):
44.96  30.13  29.97  29.97  30.04
29.96  29.96  30.00  29.99  30.00
29.92  30.01  30.02  30.01  29.99
29.85  45.12  30.03  29.92  29.96
29.98  30.00  29.98  30.00  0.24
44.73  29.94  30.04  29.96  29.86
29.96  30.05  29.85  30.17  30.02
30.00  29.94  29.99  29.99  30.04
29.97  44.99  29.99  30.08  29.88
30.05  29.95  29.97  29.87  0.15
44.95  29.98  29.91  30.08  29.98
30.00  30.01  29.96  29.94  30.04
30.01  29.96  29.88  30.00  29.95
30.04  44.99  29.99  29.96  30.03
30.00  30.07  29.94  30.01  0.21
44.77  29.95  30.02  30.01  30.00
29.96  29.98  30.00  30.00  29.94
29.99  30.04  29.93  29.99  30.02
29.98  44.99  29.99  29.96  30.01
30.03  29.95  30.00  29.97  0.21
44.81  29.88  30.05  29.99  29.99
30.01  29.97  29.99  29.99  29.98
29.99  30.00  29.97  29.98  29.97
30.01  44.95  29.97  30.03  30.00
30.00  30.00  29.99  29.97  0.21
44.79  29.95  30.00  29.99  29.95
29.98  29.93  30.06  29.94  30.08
29.97  30.00  29.97  29.99  29.98
29.94  45.05  30.04  29.91  30.00
29.99  29.97  30.01  29.98  0.21
44.79  29.94  29.99  29.89  30.06
30.03  29.96  30.04  29.98  29.90
30.04  30.00  29.98  30.00  29.97
30.07  44.96  29.98  29.93  30.07
29.98  29.90  30.00  29.94  0.13
44.97  29.98  29.99  29.94  30.02
30.00  29.93  29.99  30.02  30.01
29.99  29.96  30.02  29.90  29.93
30.01  45.04  30.06  29.99  29.98
29.94  30.04  30.00  29.92  0.20
44.83  29.94  29.99  30.00  30.01
30.02  29.87  30.03  29.94  30.03
29.99  30.00  30.07  29.90  29.95
30.05  44.97  30.01  29.98  29.97
30.01  29.99  30.00  29.97  0.21
44.77  29.96  30.00  30.03  29.91
30.00  30.01  30.03  29.93  29.98
29.99  29.99  29.93  30.04  30.04
30.01  44.92  30.04  29.97  29.91
30.08  29.89  29.97  29.88  0.15
45.01  30.09  29.89  30.01  30.01
29.97  29.95  29.96  30.05  30.04
29.88  30.00  29.99  29.94  30.05
29.98  44.99  30.01  30.00  29.99
29.95  30.00  29.88  30.11  0.21
44.78  30.01  29.96  29.99  29.98
29.98  29.99  30.01  29.91  29.82
30.10  29.99  30.15  29.96  29.93
29.98  45.05  29.97  29.99  30.02
29.96  29.98  29.95  30.04  0.21
44.74  30.02  29.97  29.97  30.03
29.99  29.93  29.94  30.07  29.99
29.99  29.94  30.02  29.97  29.90
30.01  45.12  29.91  30.03  29.95
30.03  29.97  29.87  30.09  0.20
44.79  29.98  29.97  29.99  30.01
30.01  29.97  29.99  29.99  30.01
29.99  29.94  30.01  30.00  29.98
29.98  45.02  29.97  29.91  30.06
29.99  29.96  30.02  29.98
Mean read time (ms): 30.073811959885386
Time captured (s): 10.47
Time elapsed (s): 10.777957116

Summary stats on the Linux environment for a roughly 30-second recording:

Mean read time (ms): 30.152922254616133
Time captured (s): 30.87
Time elapsed (s): 31.135111

Summary stats on Windows environment for roughly 30-second recording:

Mean read time (ms): 30.020078674852652
Time captured (s): 30.54
Time elapsed (s): 30.901762071

I'm noticing that the difference between time elapsed and time captured increases with increasing recording time on the Linux side. It also looks like the individual fetch times are less regular on the Linux side.

I've tried adjusting the buffer and fetch sizes, but I haven't found a combination that allows for quick enough fetching from the line.

What could cause the slowness in fetching? How do I determine reasonable fetch and buffer sizes such that there is low latency but quick enough fetching to keep up with real time? Are there possible sound configuration issues on Linux that could affect this or that I should check?

Thanks!

Boomkin answered 24/3, 2014 at 19:24 Comment(6)
Do you match the bitrate of DataLine? DataLine getFormat() docs.oracle.com/javase/7/docs/api/javax/sound/sampled/… *edit or maybe AudioInputStream's getFormat() docs.oracle.com/javase/7/docs/api/javax/sound/sampled/…Dogoodism
I believe they're equally irregular on Windows. But Windows tends to time things in 15ms chunks, rather than in milliseconds.Masonite
@Dogoodism getFormat() on either the target data line or an AudioInputStream constructed with that line as a parameter gives me back the format I used to obtain the line with (8000 Hz, 16-bit, mono, signed PCM, big endian).Boomkin
A couple thoughts: String I/O might be confusing the issue a bit. Perhaps write to an array instead and refrain from publishing the results until the test ends. Second, prefer timestamping with System.nanoTime(). Window's system clock (used in currentTimeMillis()), updates every 15.5 millisecs or something like that. nanoTime() uses a high resolution time source instead. Interested in seeing if you get same results. Also, there are other libraries, such as JAsioHost (I've heard about but not used it) to look into.Autopilot
@PhilFreihofner Okay, I altered the test program to use nanoTime() and reran the tests. Interestingly, it seems that with the particular parameters I'm using, the times given are close to multiples of 15 ms. However, this might be an artifact of the implementation of nanoTime on windows, too? Also, I do keep a list of the individual times and wait until the end of a recording to print them out.Boomkin
Interesting. I don't know how to account for the multiples of 15ms. When I've used nanoTime() on Windows it does not show that artifact. Having not done mike processing, I don't know the expected latency.Autopilot
L
1
private static final int MIC_FETCH_SIZE = 480; // 0.12 seconds of data 

This is far too small a buffer size for reliable performance. At 16 bit mono, it represents just 240 sound samples. Make it something more like 16000 samples, or:

private static final int MIC_FETCH_SIZE = 32000; // 2 seconds of data

Note: Java Sound will not guarantee that amount is read, and will instead return the number of bytes that are actually read. The point is, to allow the opportunity to read up to 2 seconds of data (if it is available).

I think this should solve most of the problems described above.

Lazo answered 25/3, 2014 at 8:3 Comment(7)
Thank you for your answer. How do you go from 480 bytes to 30 samples? Wouldn't each 16-bit sample be 2 bytes (resulting in 240 samples)?Boomkin
Oh sorry, that whole bit/byte thing. :P Still, try making it significantly larger.Lazo
Ah, ok. :) It does make sense to increase the amount that is read at once to reduce overhead. My worry about such a large fetch is that there would be at least as much latency as the time of the fetch in the application I'm working on (for communication). I guess I just need to trade off between reliability and latency.Boomkin
So, is it just not possible to reliably have <= 30 ms latency in sound processing from the microphone, using the Java sound API?Boomkin
It helped the skipping, but I think it means that the latency is at least as high as the amount of time corresponding to one microphone fetch. Is it just not possible to have latency as low as 30 ms, because of needing to fetch larger chunks than that?Boomkin
"It helped the skipping" OK now try 1 second, 1/2 a second and 1/4 of a second.Lazo
I'm noticing that even at a 2-second fetch size (4-second buffer size), the difference between elapsed time and captured time gradually creeps upwards... just more slowly than at lower fetch sizes. This is frustrating. It seems like something is wrong with the Linux sound system setup I'm working with that causes it to take more time to fetch the audio than the time worth fetched.Boomkin

© 2022 - 2024 — McMap. All rights reserved.