AudioTrack - short array to byte array distortion using jlayer(java mp3 decoder)

Asked 27/2, 2013 at 22:49 Answered 3/3, 2013 at 17:5

I'm using jLayer to decode MP3 data, with this call:

SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);

This call which returns the decoded data, returns an array of short[]. output.getBuffer();

When I call AudioTrack write() with that method, it plays fine as I loop through the file:

at.write(output.getBuffer(), 0, output.getBuffer().length);

However, when I convert the short[] array to byte[] array using any of the methods in this answer: https://mcmap.net/q/339075/-how-to-convert-short-array-to-byte-array the sound gets distorted and jittery:

at.write(output.getBuffer(), 0, output.getBuffer().length);

becomes:

byte[] array = ShortToByte_Twiddle_Method(output.getBuffer());
at.write(array,  0,  array.length);

Am I doing anything wrong and what can I do to fix it? Unfortunately I need the pcm data to be in a byte array for another 3rd party library I'm using. The file is 22kHz if that matters and this is how at is being instantiated:

at = new AudioTrack(AudioManager.STREAM_MUSIC, 22050, AudioFormat.CHANNEL_OUT_STEREO,
                AudioFormat.ENCODING_PCM_16BIT, 10000 /* 10 second buffer */,
                AudioTrack.MODE_STREAM);

Thank you so much in advance.

Edit: This is how I'm instantiating the AudioTrack variable now. So for 44kHz files, the value that is getting sent is 44100, while for 22kHz files, the value is 22050.

at = new AudioTrack(AudioManager.STREAM_MUSIC, decoder.getOutputFrequency(), 
                                  decoder.getOutputChannels() > 1 ? AudioFormat.CHANNEL_OUT_STEREO : AudioFormat.CHANNEL_OUT_MONO,
                                  AudioFormat.ENCODING_PCM_16BIT, 10000 /* 10 second buffer */,
                                  AudioTrack.MODE_STREAM);

This is decode method:

public byte[] decode(InputStream inputStream, int startMs, int maxMs) throws IOException {
        ByteArrayOutputStream outStream = new ByteArrayOutputStream(1024);

        float totalMs = 0;
        boolean seeking = true;

        try {
            Bitstream bitstream = new Bitstream(inputStream);
            Decoder decoder = new Decoder();

            boolean done = false;
            while (!done) {
                Header frameHeader = bitstream.readFrame();
                if (frameHeader == null) {
                    done = true;
                } else {
                    totalMs += frameHeader.ms_per_frame();

                    if (totalMs >= startMs) {
                        seeking = false;
                    }

                    if (!seeking) {
                        // logger.debug("Handling header: " + frameHeader.layer_string());
                        SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);                            

                        short[] pcm = output.getBuffer();
                        for (short s : pcm) {
                            outStream.write(s & 0xff);
                            outStream.write((s >> 8) & 0xff);
                        }
                    }

                    if (totalMs >= (startMs + maxMs)) {
                        done = true;
                    }
                }
                bitstream.closeFrame();
            }

            return outStream.toByteArray();
        } catch (BitstreamException e) {
            throw new IOException("Bitstream error: " + e);
        } catch (DecoderException e) {
            throw new IOException("Decoder error: " + e);
        }
    }

This is how it sounds (wait a few seconds): https://vimeo.com/60951237 (and this is the actual file: http://www.tonycuffe.com/mp3/tail%20toddle.mp3)

Edit: I would have loved to have split the bounty, but instead I have given the bounty to Bill and the accepted answer to Neil. Both were a tremendous help. For those wondering, I ended up rewriting the Sonic native code which helped me move along the process.

Narcho answered 27/2, 2013 at 22:49 Comment(12)

It appears that the decoding works when the file is 44kHz sample rate, but for 22kHz, it becomes completely choppy. – Narcho 3/3, 2013 at 9:14

Can you give an overview of the flow. What is the purpose of the 3rd party library. Is it a filter? Ie, Bytes in/Bytes out? Or you don't use the library and just decode a 22/44kHz mp3 and play it. One after converting to 8bits and another at 16bits? – Shirtwaist 3/3, 2013 at 18:8

It's Sonic, a rate modification engine. Basically, it allows me to modify the playback rate (speed up, slow down) while maintaining pitch (not sounding chipmunky, etc). – Narcho 3/3, 2013 at 18:9

So once I get a suitable amount of byte[]s back, I can send it to sonic and it spits back modified byte[]s, which I pass to AudioTrack. With 44kHz files it works beautifully (even with playback rate modification). Even if I take out the sonic conversion for 22 khz files, it sounds pretty bad. – Narcho 3/3, 2013 at 18:11

Yes :). I don't get anything near the sound that I'd be expecting, just a couple of "Thuds" (like someone hitting a microphone). – Narcho 3/3, 2013 at 18:31

It is indeed the library. If you notice, it takes: sonic.putBytes(in, in.length);. If it took a short[], I'd be done already :). – Narcho 3/3, 2013 at 18:46

The standard NDK version is here, github.com/waywardgeek/sonic-ndk/blob/master/src/org/… However, this library supports bytes, unsigned bytes, shorts and float? github.com/waywardgeek/sonic/blob/master/Sonic.java Also, you are free to use the Sonic.class unaltered in your own jar (or whatever the android thing is [apk?]). – Shirtwaist 3/3, 2013 at 19:20

I tried the Java version, but it keeps running into arrayOutOfBoundExceptions. – Narcho 3/3, 2013 at 19:57

It works fine without compilation. The author states it's a pure Java implementation. The main sonic library is already compiled and that part is working fine. – Narcho 3/3, 2013 at 20:4

See: Sonic.java // Use this to write 16-bit data to be speed up or down into the stream. // Return false if memory realloc failed, otherwise true. public boolean putBytes(byte[] buffer, int lenBytes), so the interface is NOT bytes, it is 16bit! – Shirtwaist 3/3, 2013 at 21:49

let us continue this discussion in chat – Shirtwaist 4/3, 2013 at 0:50

This question got closed, but 5 people voted it up. – Narcho 16/9, 2013 at 19:15

As @Bill Pringlemeir says, the problem is that your conversion method doesn't actually convert. A short is a 16 bit number; a byte is an 8 bit number. The method you have chosen doesn't convert the contents of the shorts (ie go from 16 bits to 8 bits for the contents), it changes the way in which the same collection of bits is stored. As you say, you need something like this:

SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);
byte[] array = MyShortToByte(output.getBuffer());
at.write(array,  0,  array.length);

@Bill Pringlemeir's approach is equivalent to dividing all the shorts by 256 to ensure they fit in the byte range:

byte[] MyShortToByte(short[] buffer) {
    int N = buffer.length;
    ByteBuffer byteBuf = ByteBuffer.allocate(N);
    while (N >= i) {
        byte b = (byte)(buffer[i]/256);  /*convert to byte. */
        byteBuf.put(b);
        i++;
    }
    return byteBuf.array();
}

This will work, but will probably give you very quiet, edgy tones. If you can afford the processing time, a two pass approach will probably give better results:

byte[] MyShortToByte(short[] buffer) {
    int N = buffer.length;
    short min = 0;
    short max = 0;
    for (int i=0; i<N; i++) {
         if (buffer[i] > max) max = buffer[i];
         if (buffer[i] < min) min = buffer[i];
         }
    short scaling = 1+(max-min)/256; // 1+ ensures we stay within range and guarantee no divide by zero if sequence is pure silence ...

    ByteBuffer byteBuf = ByteBuffer.allocate(N);
    for (int i=0; i<N; i++) {
        byte b = (byte)(buffer[i]/scaling);  /*convert to byte. */
        byteBuf.put(b);
    }
    return byteBuf.array();
}

Again, beware signed / unsigned issue. The above works signed-> signed and unsigned->unsigned; but not between the two. It may be that you are reading signed shorts (-32768-32767), but need to output unsigned bytes (0-255), ...

If you can afford the processing time, a more precise (smoother) approach would be to go via floats (this also gets round the signed/unsigned issue):

byte[] MyShortToByte(short[] buffer) {
    int N = buffer.length;
    float f[] = new float[N];
    float min = 0.0f;
    float max = 0.0f;
    for (int i=0; i<N; i++) {
         f[i] = (float)(buffer[i]);
         if (f[i] > max) max = f[i];
         if (f[i] < min) min = f[i];
         }
    float scaling = 1.0f+(max-min)/256.0f; // +1 ensures we stay within range and guarantee no divide by zero if sequence is pure silence ...

    ByteBuffer byteBuf = ByteBuffer.allocate(N);
    for (int i=0; i<N; i++) {
        byte b = (byte)(f[i]/scaling);  /*convert to byte. */
        byteBuf.put(b);
    }
    return byteBuf.array();
}

Hahnert answered 3/3, 2013 at 17:5 Comment(14)

I tried your conversion, it still sounds choppy. Maybe I should be padding the byte array with 0s at each conversion? – Narcho 3/3, 2013 at 17:57

A little bit, still choppy however. – Narcho 3/3, 2013 at 18:21

If you can bear the processing cost, or at least if you can fro an experiment, try the alternative I'll put above in the next couple of minutes. If it's still choppy, then it is very unlikely that this conversion is the cause, and more likely that something else is the issue – Hahnert 3/3, 2013 at 19:21

The float alternative is now up to try, but I can't edit the previous comment to reflect this ... – Hahnert 3/3, 2013 at 19:28

I tried both, still choppy. You can see the editted question at the end for the intended sound as well as the outputted sound. – Narcho 3/3, 2013 at 19:58

+1 for suggesting re-normalizing. I think that the mp3 decoder may produce 16bit values that are swapped versus processor endian-ness. – Shirtwaist 3/3, 2013 at 21:18

I've just read your edits. You are setting the encoding to 16bits when you set up the AudioTrack, yet we are converting the data to 8 bit here. Did you change that to AudioFormat.ENCODING_PCM_8BIT as @BillPringlemeir says? I'm also a bit confused as the decoder you've posted appears to return a byte array yet earlier you say it returns a short array. Or have I missed something? – Hahnert 3/3, 2013 at 21:33

@Neil: I think there is more than one thing going on. Take a look at putBytesNative at github.com/waywardgeek/sonic-ndk/tree/master/jni If you follow it through, it seem even though the top-level Java interface is putBytes(), it is actually treating the byte buffer as shorts. Also, it is not clear what the endian-ness of the mp3 decoder is. As soon as we try to munge the PCM, we have issue (except at 44k? Why?). The endian-ness/PCM size of the AudioTrack/AudioFormat also must be correct to get good sound. – Shirtwaist 3/3, 2013 at 21:42

I could well believe we have an endian issue here. Do you think an endian swap in the short would be a good idea, ie changing f[i] = (float)(buffer[i]); to f[i] = (float)((256*(buffer[i]%256) + buffer[i]/256)); would be worth trying out? – Hahnert 3/3, 2013 at 21:56

I must say I really appreciate how much you guys are trying to help out. Hopefully once this is settled I'll be able to compensate you guys better than in a way of bounty. Sonic is not a problem with regards to the distortion because even without sonic in the equation, once I convert from short[] to byte[], 22 kHz files no longer sound as originally encoded. The original code was from this article, and one person in the comments had the same problem: mindtherobot.com/blog/624/… – Narcho 3/3, 2013 at 21:58

The other person's problem "Great blog, I was just wondering about the check from lines 29-32. I do have a case where my generated mp3 is 22khz mono. I tried removing the check and running the conversion on audio track I get an alien sound. If I run the code as a desktop application and save the result in a wav file no editor can identify it. Thanks in advance " – Narcho 3/3, 2013 at 22:2

The code in that article specifically claims to only work for 44k, so I guess that at 22k there is something different about the data returned. I'm afraid I'm out of ideas for the night, but would be interested to know if the endian switch suggested above works ... – Hahnert 3/3, 2013 at 22:18

Thanks for trying, seriously. Like I said, the 22k file is being returned perfectly as a short[], but getting muddled when being converted to byte[]. So the decoding is actually going well, it's when I start to play with the audio data that things getting screwy. – Narcho 4/3, 2013 at 1:59

Other idea: When you went from 44 to 22 kHz, did anything else change? Did it go from stereo to mono or mono to stereo? – Hahnert 4/3, 2013 at 19:46

The issue is with your short to byte conversion. The byte conversion link preserves all information including the high and low byte portions. When you are converting from 16bit to 8bit PCM samples, you must discard the lower byte. My Java skills are weak, so the following may not work verbatim. See also: short to byte conversion.

ByteBuffer byteBuf = ByteBuffer.allocate(N);
while (N >= i) {
  /* byte b = (byte)((buffer[i]>>8)&0xff);  convert to byte. native endian */
 byte b = (byte)(buffer[i]&0xff);  /*convert to byte; swapped endian. */
 byteBuf.put(b);
  i++;
}

That is the following conversion,

  AAAA AAAA SBBB BBBB  -> AAAA AAAA, +1 if S==1 and positive else -1 if S==1

A is a bit that is kept. B is a discarded bit and S is a bit that you may wish to use for rounding. The rounding is not needed, but it may sound a little better. Basically, 16 bit PCM is higher resolution than 8 bit PCM. You lose those bits when the conversion is done. The short to byte routine tries to preserve all information.

Of course, you must tell the sound library that you are using 8-bit PCM. My guess,

at = new AudioTrack(AudioManager.STREAM_MUSIC, 22050, AudioFormat.CHANNEL_OUT_STEREO,
            AudioFormat.ENCODING_PCM_8BIT, 10000 /* 10 second buffer */,
            AudioTrack.MODE_STREAM);

If you can only use 16bit PCM to play audio, then you have to do the inverse and convert the 8bit PCM from the library to 16bit PCM for playback. Also note, that typically, 8bit samples are often NOT straight PCM but u-law or a-law encoded. If the 3^rd party library uses these formats, the conversion is different but you should be able to code it from the wikipedia links.

NOTE: I have not included the rounding code as overflow and sign handling will complicate the answer. You must check for overflow (Ie, 0x8f + 1 gives 0xff or 255 + 1 giving -1). However, I suspect the library is not straight 8bit PCM.

See Also: Alsa PCM overview, Multi-media wiki entry on PCM - Ultimately Android uses ALSA for sound.

Other factors that must be correct for a PCM raw buffer are sample rate, number of channels (stereo/mono), PCM format including bits, companding, little/big endian and sample interleaving.

EDIT: After some investigation, the JLayer decoder typically returns big endian 16bit values. The Sonic filter, takes a byte but threats them as 16bit little endian underneath. Finally, the AudioTrack class expects 16 bit little endian underneath. I believe that for some reason the JLayer mp3 decoder will return 16bit little endian values. The decode() method in the question does a byte swap of the 16 bit values. Also, the posted audio sounds as if the bytes are swapped.

public byte[] decode(InputStream inputStream, int startMs, int maxMs, bool swap) throws IOException {
...
                    short[] pcm = output.getBuffer();
                    for (short s : pcm) {
                        if(swap) {
                          outStream.write(s & 0xff);
                          outStream.write((s >> 8) & 0xff);
                        } else {
                          outStream.write((s >> 8) & 0xff);
                          outStream.write(s & 0xff);
                        }
                    }
...

For 44k mp3s, you call the routine with swap = true;. For the 22k mp3 swap = false. This explains all the reported phenomena. I don't know why the JLayer mp3 decoder would sometimes output big endian and other times little endian. I imagine it depends on the source mp3 and not the sample rate.

Shirtwaist answered 3/3, 2013 at 16:19 Comment(5)

Thanks for your help so far, I'm in the process of trying it. I think I understand what you're saying, but why would a 44kHz file converted in the original method and set as 16 bit pcm work? – Narcho 3/3, 2013 at 17:41

Sorry if I wasn't clear, when I switch to the 22 kHz file, I switch the AudioTrack instantiation to be at 22050. In fact, it's now handled automatically. Please see the edit to see the new instantiation. – Narcho 3/3, 2013 at 17:54

0 [Tuna ]: OMAP4 - Tuna TI OMAP4 Board 1 [OMAP4HDMI ]: OMAP4HDMI - OMAP4HDMI OMAP4HDMI – Narcho 3/3, 2013 at 18:28

How would that be possible? The problem is, the act of converting the short[] to byte[] is causing the audio corruption, which happens before Sonic gets the data. If I send to the audiotrack the array as shorts, it sounds perfectly fine. If I send it as a byte[] array, for 44 kHz files it works just as well, for 22 kHz, well: vimeo.com/60951237 – Narcho 3/3, 2013 at 20:1

Ok. I didn't know that 22k works with 16bit. It still seems like the problem is the conversion OR specifying the PCM playback. It is NOT the driver. The 16bit sample maybe big-endian or little-endian. Please look at my edit and try that without Sonic. It appears the buffers take a byte, but the are converted to short by the sonic JNI methods. – Shirtwaist 3/3, 2013 at 21:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags