Convert 16 bit stereo sound to 16 bit mono sound
Asked Answered
P

1

7

I'm trying to convert 16 bit stereo sound from a WAVE file to 16 bit mono sound, but I'm having some struggle. I've tried to convert 8 bit stereo sound to mono and it's working great. Here's the piece of code for that:

if( bitsPerSample == 8 )
{
    dataSize /= 2;
    openALFormat = AL_FORMAT_MONO8;

    for( SizeType i = 0; i < dataSize; i++ )
    {
        pData[ i ] = static_cast<Uint8>(
                        (   static_cast<Uint16>( pData[ i * 2 ] ) +
                        static_cast<Uint16>( pData[ i * 2 + 1 ] ) ) / 2
        );
    }

But, now I'm trying to do pretty much the same with 16 bit audio, but I just can't get it to work. I can just hear some kind of weird noise. I've tried to set "monoSample" to "left"(Uint16 monoSample = left;) and the audio data from that channel works very well. The right channel as well. Can anyone of you see what I'm doing wrong? Here's the code(pData is an array of bytes):

if( bitsPerSample == 16 )
{
    dataSize /= 2;
    openALFormat = AL_FORMAT_MONO16;

    for( SizeType i = 0; i < dataSize / 2; i++ )
    {
        Uint16 left =   static_cast<Uint16>( pData[ i * 4 ] ) |
                        ( static_cast<Uint16>( pData[ i * 4 + 1 ] ) << 8 );

        Uint16 right =  static_cast<Uint16>( pData[ i * 4 + 2 ] ) |
                        ( static_cast<Uint16>( pData[ i * 4 + 3 ] ) << 8 );

        Uint16 monoSample = static_cast<Uint16>(
                                (   static_cast<Uint32>( left ) +
                                static_cast<Uint32>( right ) ) / 2
            );

        // Set the new mono sample.
        pData[ i * 2 ] =  static_cast<Uint8>( monoSample );
        pData[ i * 2 + 1 ] =  static_cast<Uint8>( monoSample >> 8 );
    }
}
Pyrargyrite answered 6/5, 2014 at 21:31 Comment(0)
O
10

In a 16 bit stereo WAV file, each sample is 16 bits, and the samples are interleaved. I'm not sure why you're using a bitwise OR, but you can just retrieve the data directly without having to shift. The below non-portable code (assumes sizeof(short) == 2) illustrates this.

unsigned size = header.data_size;
char *data = new char[size];

// Read the contents of the WAV file in to data

for (unsigned i = 0; i < size; i += 4)
{
  short left = *(short *)&data[i];
  short right = *(short *)&data[i + 2];
  short monoSample = (int(left) + right) / 2;
}

Also, while 8 bit WAV files are unsigned, 16 bit WAV files are signed. To average them, make sure you store it in an appropriately sized signed type. Note that one of the samples is promoted to an int temporarily to prevent overflow.

As has been pointed out in the comments below by Stix, simple averaging may not give the best results. Your mileage may vary.

In addition, Greg Hewgill correctly noted that this assumes that the machine is little-endian.

Oocyte answered 6/5, 2014 at 21:41 Comment(12)
should be i += 4 shouldn't it? Otherwise your left channel will just be whatever your right channel was the last iteration.Gossipmonger
Why are you dividing by 2? That will knock down the original signals by 3 dB.Mortise
It is dividing by two because it is averaging two samples.Oocyte
Depending on the requirements for data accuracy, you may not want to average the signals. If, for example, you have a 50 dB 200 Hz tone on the left channel, and a 50 dB 600 Hz tone on the right channel, you'll wind up with two 47 dB tones in the mono stream. In my opinion, it's better to simply add them together, since that will result in a replication of the original signal, just lacking directional information.Mortise
Thank you, using signed short values for left, right and monoSample fixed it. How come 16 bit mono files are signed but not 8 bit?Pyrargyrite
For completeness, you should note that with respect to portability, you are also assuming a little-endian machine (since the samples are stored little-endian).Liguria
@stix. It's dividing by two because adding two 16 bit numbers results in a 17 bit number (with carry) which would overflow. The divide by two is the same as a >>1 which brings it back down to 16 bits. Not averaging as @DavidS suggested because if you were to sum 3 signals together then you need to >>2 aka divide by 4.Defibrillator
Also @stix, in regard to your comment about the 200Hz and 600Hz tones. Consider the case where both of the tones are at 0dB. With out the division you'll likely clip. Dropping the LSB is not going to affect the quality of the signal.Defibrillator
@Defibrillator 0 dB converts to an amplitude level of 1. In a 16 bit signed format, which the OP is using, you have an amplitude range of +- 32767, or a dynamic range of around 90 dB, so you're not going to clip with two 0 dB signals. My point is not that the quality of the signal will be affected (which it will, depending on the definition of quality), but the absolute level. If the OP needs good representation of the original signal's levels, then losing 3 dB could be a non trivial amount, especially given that most filters are designed so that the 3 dB down point represents the cutoff.Mortise
@Mortise The last time I checked 32767+32767 was indeed greater than 32767. Consider the case of adding two full scale sine waves together that have the same frequency and phase. wrt your comment about the 3dB down point of a filter, I'm not quite following. The level is being reduced uniformly across the spectrum and is not going to have any effect on the frequency response. The 3dB point is still the 3dB point. To use your example of a 200Hz and 600Hz sine both at 0dB and a filter with the hp cutoff frequency at 200Hz. After summing and then filtering you'll have 200Hz @-6dB and 600Hz @-3dB.Defibrillator
The last time I checked, 0 dB was not a full scale wave form, but then my math could be off?Mortise
0dB is a full scale waveform. 20*log10(1) = 0. BTW, regarding the -3dB, each sine tone is actually reduced by 6dB. I don't know why I didn't catch this earlier but (x+y)/2 = x/2 + y/2.Defibrillator

© 2022 - 2024 — McMap. All rights reserved.