Mixing 16 bit linear PCM streams and avoiding clipping/overflow

H

6

9

I've trying to mix together 2 16bit linear PCM audio streams and I can't seem to overcome the noise issues. I think they are coming from overflow when mixing samples together.

I have following function ...

short int mix_sample(short int sample1, short int sample2)
{
    return #mixing_algorithm#;
}

... and here's what I have tried as #mixing_algorithm#

sample1/2 + sample2/2
2*(sample1 + sample2) - 2*(sample1*sample2) - 65535
(sample1 + sample2) - sample1*sample2
(sample1 + sample2) - sample1*sample2 - 65535
(sample1 + sample2) - ((sample1*sample2) >> 0x10) // same as divide by 65535

Some of them have produced better results than others but even the best result contained quite a lot of noise.

Any ideas how to solve it?

Heaviness answered 23/8, 2012 at 10:31 Comment(2)

can you write the full algorithm,I can't see any assignments!! – Cattegat 23/8, 2012 at 10:37

When you divide sample1 and sample2 by 2, you get error range of 1. – Canvass 23/8, 2012 at 10:38

J

8

here's a descriptive implementation:

short int mix_sample(short int sample1, short int sample2) {
    const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
    typedef std::numeric_limits<short int> Range;
    if (Range::max() < result)
        return Range::max();
    else if (Range::min() > result)
        return Range::min();
    else
        return result;
}

to mix, it's just add and clip!

to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t buffer with a small amount of lookahead. this will introduce latency.

more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.

Jennings answered 23/8, 2012 at 11:24 Comment(4)

The only "correct" way to avoid clipping is to divide by two. There is some illustrative code here in the "Distortion and Noise" Section: blog.bjornroche.com/2013/05/… – Cabalistic 23/8, 2013 at 12:40

Have to downvote this because it only solves the 'local' issue of mixing a single sample. If you look at a big soundwave, this is actually a horrible algorithm, since it will cut off high amplitude waves and introduces clipping noise. One proper way is to use float samples and smoothly apply dynamic wave amplitude compression. This will ensure no artificial clipping occurs - the sound will just get quieter during high amplitudes. – Notary 4/10, 2016 at 8:0

@JormaRebane Do you systematically downvote answers to beginners' questions on every subject? – Jennings 13/10, 2016 at 7:17

The divide by two method halves the volume of the output when one signal is silent. Probably not what one wants. – Thaine 11/7, 2018 at 2:55

S

13

The best solution I have found is given by Viktor Toth. He provides a solution for 8-bit unsigned PCM, and changing that for 16-bit signed PCM, produces this:

int a = 111; // first sample (-32768..32767)
int b = 222; // second sample
int m; // mixed result will go here

// Make both samples unsigned (0..65535)
a += 32768;
b += 32768;

// Pick the equation
if ((a < 32768) || (b < 32768)) {
    // Viktor's first equation when both sources are "quiet"
    // (i.e. less than middle of the dynamic range)
    m = a * b / 32768;
} else {
    // Viktor's second equation when one or both sources are loud
    m = 2 * (a + b) - (a * b) / 32768 - 65536;
}

// Output is unsigned (0..65536) so convert back to signed (-32768..32767)
if (m == 65536) m = 65535;
m -= 32768;

Using this algorithm means there is almost no need to clip the output as it is only one value short of being within range. Unlike straight averaging, the volume of one source is not reduced even when the other source is silent.

Semipostal answered 3/8, 2014 at 6:44 Comment(5)

What do you mean by "quiet"? - that would normally be mean low magnitude (near the middle), but here you appear to mean negative (below the middle), whereas the "loud" equation is executed when one or both are positive (before shifting - i.e. adding a DC bias)). Apart from that volume is a perception of the signal, not an individual sample - a "loud" sound will have samples across the entire range. – Killerdiller 3/8, 2014 at 7:28

@Clifford: Middle being the middle of the available range, so if the values are between 0 and 65535, then the middle is 32767. It is better explained at the link to Viktor Toth's page. – Semipostal 3/8, 2014 at 7:30

I realise that - my question was rhetorical - the terms "quiet" and "loud" are inaccurate and misleading in this context. – Killerdiller 3/8, 2014 at 7:32

Which is exactly why I put "quiet" in scare quotes, to hint that the meaning is a little different to what you might expect :-) Plus I then followed it with an explanation of what I meant... – Semipostal 3/8, 2014 at 7:36

In the original explanation it is about relationship to the mid-point; the term "quiet" is used differently and correctly there to mean "close to the midpoint". Although this is the best answer IMO (hence the up-vote), the comments are a misrepresentation of Victor Toth's explanation. – Killerdiller 3/8, 2014 at 7:57

J

8

here's a descriptive implementation:

short int mix_sample(short int sample1, short int sample2) {
    const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
    typedef std::numeric_limits<short int> Range;
    if (Range::max() < result)
        return Range::max();
    else if (Range::min() > result)
        return Range::min();
    else
        return result;
}

to mix, it's just add and clip!

to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t buffer with a small amount of lookahead. this will introduce latency.

more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.

Jennings answered 23/8, 2012 at 11:24 Comment(4)

The only "correct" way to avoid clipping is to divide by two. There is some illustrative code here in the "Distortion and Noise" Section: blog.bjornroche.com/2013/05/… – Cabalistic 23/8, 2013 at 12:40

Have to downvote this because it only solves the 'local' issue of mixing a single sample. If you look at a big soundwave, this is actually a horrible algorithm, since it will cut off high amplitude waves and introduces clipping noise. One proper way is to use float samples and smoothly apply dynamic wave amplitude compression. This will ensure no artificial clipping occurs - the sound will just get quieter during high amplitudes. – Notary 4/10, 2016 at 8:0

@JormaRebane Do you systematically downvote answers to beginners' questions on every subject? – Jennings 13/10, 2016 at 7:17

The divide by two method halves the volume of the output when one signal is silent. Probably not what one wants. – Thaine 11/7, 2018 at 2:55

N

2

Here is what I did on my recent synthesizer project.

int* unfiltered = (int *)malloc(lengthOfLongPcmInShorts*4);
int i;
for(i = 0; i < lengthOfShortPcmInShorts; i++){
    unfiltered[i] = shortPcm[i] + longPcm[i];
}
for(; i < lengthOfLongPcmInShorts; i++){
     unfiltered[i] = longPcm[i];
}

int max = 0;
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   int val = unfiltered[i];
   if(abs(val) > max)
      max = val;
}

short int *newPcm = (short int *)malloc(lengthOfLongPcmInShorts*2);
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   newPcm[i] = (unfilted[i]/max) * MAX_SHRT;
}

I added all the PCM data into an integer array, so that I get all the data unfiltered.

After doing that I looked for the absolute max value in the integer array.

Finally, I took the integer array and put it into a short int array by taking each element dividing by that max value and then multiplying by the max short int value.

This way you get the minimum amount of 'headroom' needed to fit the data.

You might be able to do some statistics on the integer array and integrate some clipping, but for what I needed the minimum amount of headroom was good enough for me.

Nonlegal answered 18/11, 2014 at 17:28 Comment(0)

S

1

There's a discussion here: https://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping about why the A+B - A*B solution is not ideal. Hidden down in one of the comments on this discussion is the suggestion to sum the values and divide by the square root of the number of signals. And an additional check for clipping couldn't hurt. This seems like a reasonable (simple and fast) middle ground.

Suh answered 2/4, 2020 at 13:50 Comment(0)

B

0

I think they should be functions mapping [MIN_SHORT, MAX_SHORT] -> [MIN_SHORT, MAX_SHORT] and they are clearly not (besides first one), so overflows occurs.

If unwind's proposition won't work you can also try:

((long int)(sample1) + sample2) / 2

Bacillus answered 23/8, 2012 at 10:46 Comment(2)

While adding the signals is correct; with simple normalisation to maintain range, one signal will affect the other undesirably. For example if sample1 is always zero (silent), you would want only sample2, but you get sample2 / 2 - i.e. the output is quieter. – Killerdiller 3/8, 2014 at 8:1

Yes, you are totally right. But solves the problem of overflow and clipping. The best solution IMHO would be to scale the signals depending on their value, like w(s1,s2)*s1 + (1-w(s1,s2))*s2 where w(s1,s2) is some function where w(s1,0) = 1, w(0,s2) = 0 and 0 < w(s1,s2) < 1 when s1 != 0 && s2 != 0 – Bacillus 20/12, 2014 at 10:51

L

-1

Since you are in time domain the frequency info is in the difference between successive samples, when you divide by two you damage that information. That's why adding and clipping works better. Clipping will of course add very high frequency noise which is probably filtered out.

Lyndy answered 23/4, 2013 at 0:43 Comment(1)

I expect the noise the OP is hearing is caused by the values wrapping, rather than anything as subtle as a single bit of lost resolution – Mordvin 3/6, 2014 at 12:42

Recommended topics

Hot tags