Audio: Change Volume of samples in byte array

D

4

16

I'm reading a wav-file to a byte array using this method (shown below). Now that I have it stored inside my byte array, I want to change the sounds volume.

private byte[] getAudioFileData(final String filePath) {
    byte[] data = null;
    try {
    final ByteArrayOutputStream baout = new ByteArrayOutputStream();
    final File file = new File(filePath);
    final AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(file);

    byte[] buffer = new byte[4096];
    int c;
    while ((c = audioInputStream.read(buffer, 0, buffer.length)) != -1) {
        baout.write(buffer, 0, c);
    }
    audioInputStream.close();
    baout.close();
    data = baout.toByteArray();
    } catch (Exception e) {
    e.printStackTrace();
    }
    return data;
}

Edit: Per request some info on the audio format:

PCM_SIGNED 44100.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian

From physics-class I remembered that you can change the amplitude of a sine-wave by multiplying the sine-value with a number between 0 and 1.

Edit: Updated code for 16-bit samples:

private byte[] adjustVolume(byte[] audioSamples, double volume) {
    byte[] array = new byte[audioSamples.length];
    for (int i = 0; i < array.length; i+=2) {
        // convert byte pair to int
        int audioSample = (int) ((audioSamples[i+1] & 0xff) << 8) | (audioSamples[i] & 0xff);

        audioSample = (int) (audioSample * volume);

        // convert back
        array[i] = (byte) audioSample;
        array[i+1] = (byte) (audioSample >> 8);

    }
    return array;
}

The sound is heavily distorted if I multiply audioSample with volume. If I don't and compare both arrays with Arrays.compare(array, audioSample) I can conclude that the byte-array is being converted correctly to int and the other way around.

Can anybody help me out? What am I getting wrong here? Thank you! :)

Downandout answered 23/1, 2013 at 17:36 Comment(5)

You might get better answers on dsp.stackexchange.com – Gypsum 23/1, 2013 at 17:40

1) For better help sooner, post an SSCCE. 2) Report audioInputStream.getFormat(). – Mozell 23/1, 2013 at 17:43

@Gypsum Thank you! Can I just copy & paste it there or what are the rules for moving topics? – Downandout 23/1, 2013 at 17:44

@AndrewThompson Thank you for your tips. audioInputStream.getFormat() says: PCM_SIGNED 44100.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian – Downandout 23/1, 2013 at 17:51

Uh-huh. See the answer by @johusman. – Mozell 23/1, 2013 at 17:57

S

9

Are you sure you're reading 8-bit mono audio? Otherwise one byte does not equal one sample, and you cannot just scale each byte. E.g. if it is 16-bit data you have to parse every pair of bytes as a 16-bit integer, scale that, and then write it back as two bytes.

Spiroid answered 23/1, 2013 at 17:45 Comment(3)

Thank you. My audio has a sample size of 16 bits. I'm going to read up on how to convert my byte array properly and give you my feedback once I managed to do that. – Downandout 23/1, 2013 at 17:57

Hey. :) I've finally managed to correctly convert my byte array to int and back. However, my sound is even more heavily distorted than before if I multiply my samples with volume. I've updated the code in the question. Could you have a look? That would be great. Thank you! :) – Downandout 24/1, 2013 at 11:13

Just gave it a quick glance, but I suspect you aren't dealing properly with negative values (you convert to an int, which is 32-bit, maybe you could use a short?). Remember that java's signed integers are two's complement. – Spiroid 24/1, 2013 at 13:28

T

10

Problem in int type, size of int in java is 4 bytes and the sample size is 2 bytes

This worked code:

private byte[] adjustVolume(byte[] audioSamples, float volume) {
        byte[] array = new byte[audioSamples.length];
        for (int i = 0; i < array.length; i+=2) {
            // convert byte pair to int
            short buf1 = audioSamples[i+1];
            short buf2 = audioSamples[i];

            buf1 = (short) ((buf1 & 0xff) << 8);
            buf2 = (short) (buf2 & 0xff);

            short res= (short) (buf1 | buf2);
            res = (short) (res * volume);

            // convert back
            array[i] = (byte) res;
            array[i+1] = (byte) (res >> 8);

        }
        return array;
}

Tweeter answered 25/9, 2014 at 11:33 Comment(2)

Would it be possible to also control stereo volume ? – Impervious 9/10, 2017 at 18:18

YES and quite easily once you know PCM is 16-bit LL RR LL RR LL RR LL (where each character is 1 byte) so I just increment i by 4 and reuse your code. – Impervious 9/10, 2017 at 19:1

S

9

Are you sure you're reading 8-bit mono audio? Otherwise one byte does not equal one sample, and you cannot just scale each byte. E.g. if it is 16-bit data you have to parse every pair of bytes as a 16-bit integer, scale that, and then write it back as two bytes.

Spiroid answered 23/1, 2013 at 17:45 Comment(3)

Thank you. My audio has a sample size of 16 bits. I'm going to read up on how to convert my byte array properly and give you my feedback once I managed to do that. – Downandout 23/1, 2013 at 17:57

Hey. :) I've finally managed to correctly convert my byte array to int and back. However, my sound is even more heavily distorted than before if I multiply my samples with volume. I've updated the code in the question. Could you have a look? That would be great. Thank you! :) – Downandout 24/1, 2013 at 11:13

Just gave it a quick glance, but I suspect you aren't dealing properly with negative values (you convert to an int, which is 32-bit, maybe you could use a short?). Remember that java's signed integers are two's complement. – Spiroid 24/1, 2013 at 13:28

C

8

The answer by Rodion was a good starting point, but it not sufficient to give good results.

It introduced overflows and was not fast enough for real-time audio on Android.

TL;DR: My improved solution involving a LUT and gain compression

private static int N_SHORTS = 0xffff;
private static final short[] VOLUME_NORM_LUT = new short[N_SHORTS];
private static int MAX_NEGATIVE_AMPLITUDE = 0x8000;

static {
    precomputeVolumeNormLUT();
}    

private static void normalizeVolume(byte[] audioSamples, int start, int len) {
    for (int i = start; i < start+len; i+=2) {
        // convert byte pair to int
        short s1 = audioSamples[i+1];
        short s2 = audioSamples[i];

        s1 = (short) ((s1 & 0xff) << 8);
        s2 = (short) (s2 & 0xff);

        short res = (short) (s1 | s2);

        res = VOLUME_NORM_LUT[res+MAX_NEGATIVE_AMPLITUDE];
        audioSamples[i] = (byte) res;
        audioSamples[i+1] = (byte) (res >> 8);
    }
}

private static void precomputeVolumeNormLUT() {
    for(int s=0; s<N_SHORTS; s++) {
        double v = s-MAX_NEGATIVE_AMPLITUDE;
        double sign = Math.signum(v);
        // Non-linear volume boost function
        // fitted exponential through (0,0), (10000, 25000), (32767, 32767)
        VOLUME_NORM_LUT[s]=(short)(sign*(1.240769e-22 - (-4.66022/0.0001408133)*
                           (1 - Math.exp(-0.0001408133*v*sign))));
    }
}

This works very well, boosts audio nicely, does not have a problem with clipping and can run real-time on Android.

How I got there

My task was to wrap a proprietary closed-source TTS engine (supplied by customer) to make it work as a standard Android TextToSpeechService. The customer was complaining about the volume being too low, even though the stream volume was set to highest.

I had to find a way to boost the volume in Java in real-time while avoiding clipping and distortion.

There were two problems with Rodion's solution:

the code was running a bit too slow for real-time operation on a phone (float is slow)
it doesn't prevent overflow, which may cause bad and noticeable artifacts

I came to this solution:

Computation speed can be improved by trading RAM for CPU and using a look-up-table (LUT), i.e. pre-computing the volume-boost function value for every input short value out there.

This way you sacrifice 128K of RAM but get rid of the floating point and multiplication during sound processing completely, which in my case was a win.

As for the overflow, there are two ways around this. The ugly one is to simply replace the values outside of the short range with Short.MIN_VALUE or Short.MAX_VALUE respectively. It does not prevent clipping, but at least it does not overflow and the artifacts are way less disturbing.

But I found a better way, which is to apply a non-linear boost (also called gain compression). You can use an exponential function and instead of just pre-computing a multiplication LUT, you can pre-compute non-linear boost. Actually, the function plays very well with the LUT and any similar function can be pre-computed this way.

The best way to find a good boost function and optimal parameters for the function is to experiment with different functions for a while, a simple but good tool is https://mycurvefit.com/

One of the functions seemed promising, I just had to make a small modification to make negative values work in a symmetrical fashion.

$y=\mathrm{sign}(x)\cdot \left[ y_0-\frac{v_0}{k}(1-e^{-k \cdot \mathrm{sign}(x)})\right]$

After playing with some parameters, I came to the conclusion that I'll get good results if the function passes through [0,0], [10000, 25000] and [32767, 32767].

I needed quite a big volume boost, you may want to be more subtle.

MyCurveFit gave me this set of parameters: y₀ = 1.240769e-22, v₀ = -4.66022, k = 0.0001408133

The final boost function to be pre-computed in the LUT looks like this:

Disclaimer: I'm not a DSP expert and I was warned that a boost like this is not suitable for Hi-Fi music and such, because it introduces changes in timbre, harmonics and other subtle artifacts. But it's fast and worked very well for my purpose and I think it will be acceptable for many uses involving speech and Lo-Fi stuff in general.

Cremate answered 6/10, 2018 at 22:35 Comment(2)

Hi @jan-hadáček this code is awesome, but i'm having a little bit of trouble with it. On this line im getting an outofbounds res = VOLUME_NORM_LUT[res+MAX_NEGATIVE_AMPLITUDE]; – Terrazzo 19/6, 2019 at 21:4

@RicardoRodrigues same, I changed it to: res = VOLUME_NORM_LUT[Math.min(res + MAX_NEGATIVE_AMPLITUDE, N_SHORTS - 1)]; – Demoralize 6/8, 2020 at 7:50

P

1

Are you sure that one byte is one sample? In this format specification it looks like a sample has 2 byttes. And do not forget to let the header unchanged.

WAVE PCM soundfile format

Purkey answered 23/1, 2013 at 17:46 Comment(1)

Thank you for your help. I've updated my code in the question, but it's still not working. :( – Downandout 24/1, 2013 at 11:20

TL;DR: My improved solution involving a LUT and gain compression

How I got there

Recommended topics

Hot tags