How to convert pcm samples in byte array as floating point numbers in the range -1.0 to 1.0 and back?
Asked Answered
P

2

32

The resampling algorithm i use expects float array containing input samples in the range -1.0 to 1.0 . The audio data is 16 bit PCM with samplerate 22khz.

I want to downsample the audio from 22khz to 8khz, how to represent the samples in byte array as floating point numbers >= -1 and <= 1 and back to byte array?

Pelpel answered 26/2, 2013 at 11:16 Comment(2)
The poster is asking two questions: 1. how to convert from float to 16-bit int, and 2. how to downsample from 22kHz to 8kHz. The FFT is NOT the appropriate method for solving either of those problems.Comet
Just circling back to this question after writing a post that might help future readers: blog.bjornroche.com/2013/05/…Comet
C
63

You ask two questions:

  1. How to downsample from 22kHz to 8kHz?

  2. How to convert from float [-1,1] to 16-bit int and back?

Note that the question has been updated to indicate that #1 is taken care of elsewhere, but I'll leave that part of my answer in in case it helps someone else.

1. How to downsample from 22kHz to 8kHz?

A commenter hinted that this can be solved with the FFT. This is incorrect (One step in resampling is filtering. I mention why not to use the FFT for filtering here, in case you are interested: http://blog.bjornroche.com/2012/08/when-to-not-use-fft.html).

One very good way to resample a signal is with a polyphase filter. However, this is quite complex, even for someone experienced in signal processing. You have several other options:

  • use a library that implements high quality resampling, like libsamplerate
  • do something quick and dirty

It sounds like you have already gone with the first approach, which is great.

A quick and dirty solution won't sound as good, but since you are going down to 8 kHz, I'm guessing sound quality isn't your first priority. One quick and dirty option is to:

  • Apply a low pass filter to the signal. Try to get rid of as much audio above 4 kHz as you can. You can use the filters described here (although ideally you want something much steeper than those filters, they are at least better than nothing).
  • select every 2.75th sample from the original signal to produce the new, resampled signal. When you need a non-integer sample, use linear interpolation. If you need help with linear interpolation, try here.

This technique should be more than good enough for voice applications. However, I haven't tried it, so I don't know for sure, so I strongly recommend using someone else's library.

If you really want to implement your own high quality sample rate conversion, such as a polyphase filter, you should research it, and then ask whatever questions you have on https://dsp.stackexchange.com/, not here.

2. How to convert from float [-1,1] to 16-bit int and back?

This was started by c.fogelklou already, but let me embellish.

To start with, the range of 16 bit integers is -32768 to 32767 (usually 16-bit audio is signed). To convert from int to float you do this:

float f;
int16 i = ...;
f = ((float) i) / (float) 32768
if( f > 1 ) f = 1;
if( f < -1 ) f = -1;

You usually do not need to do that extra "bounding", (in fact you don't if you really are using a 16-bit integer) but it's there in case you have some >16-bit integers for some reason.

To convert back, you do this:

float f = ...;
int16 i;
f = f * 32768 ;
if( f > 32767 ) f = 32767;
if( f < -32768 ) f = -32768;
i = (int16) f;

In this case, it usually is necessary to watch out for out of range values, especially values greater than 32767. You might complain that this introduces some distortion for f = 1. This issue is hotly debated. For some (incomplete) discussion of this, see this blog post.

This is more than "good enough for government work". In other words, it will work fine except in the case where you are concerned about ultimate sound quality. Since you are going to 8kHz, I think we have established that's not the case, so this answer is fine.

However, for completeness, I must add this: if you are trying to keep things absolutely pristine, keep in mind that this conversion introduces distortion. Why? Because the error when converting from float to int is correlated with the signal. It turns out that the correlation of that error is terrible and you can actually hear it, even though it's very small. (fortunately it's small enough that for things like speech and low-dynamic range music it doesn't matter much) To eliminate this error, you must use something called dither in the conversion from float to int. Again, if that's something you care about, research it and ask relevant, specific questions on https://dsp.stackexchange.com/, not here.

You might also be interested in the slides from my talk on the basics of digital audio programming, which has a slide on this topic, although it basically says the same thing (maybe even less than what I just said): http://blog.bjornroche.com/2011/11/slides-from-fundamentals-of-audio.html

Comet answered 26/2, 2013 at 16:44 Comment(1)
@Reneez: If you want to save some performance, use multiplies with (1.0f/32768.0f) instead of dividing by 32768.0f, since divides are much more costly than multiplies regardless of whether or not you have a hardware multiplier on your platform. Also, the clipping Bjorn has when going int-->float is not necessary since it is impossible to divide an int16_t by 32768.0 and get an answer > 1. I will add some test code to my answer that shows that the conversion can go forward and back without any errors.Qp
Q
20

16 bit PCM has a range - 32768 to 32767. So, multiply each of your PCM samples by (1.0f/32768.0f) into a new array of floats, and pass that to your resample.

Going back to float after resampling, multiply by 32768.0, saturate (clip anything outside the range - 32768 to 32767), round (or dither as Björn mentioned) and then cast back to short.

Test code that shows conversion forward and back using multiplies with no bit errors:

// PcmConvertTest.cpp : Defines the entry point for the console application.
//

#include <assert.h>
#include <string.h>
#include <stdint.h>
#define SZ 65536
#define MAX(x,y) ((x)>(y)) ? (x) : (y)
#define MIN(x,y) ((x)<(y)) ? (x) : (y)
int main(int argc, char* argv[])
{
  int16_t *pIntBuf1 = new int16_t[SZ];
  int16_t *pIntBuf2 = new int16_t[SZ];
  float   *pFloatBuf = new float[SZ];

  // Create an initial short buffer for testing
  for( int i = 0; i < SZ; i++) {
    pIntBuf1[i] = (int16_t)(-32768 + i);
  }

  // Convert the buffer to floats. (before resampling)
  const float div = (1.0f/32768.0f);
  for( int i = 0; i < SZ; i++) {
    pFloatBuf[i] = div * (float)pIntBuf1[i];
  }

  // Convert back to shorts
  const float mul = (32768.0f);
  for( int i = 0; i < SZ; i++) {
    int32_t tmp = (int32_t)(mul * pFloatBuf[i]);
    tmp = MAX( tmp, -32768 ); // CLIP < 32768
    tmp = MIN( tmp, 32767 );  // CLIP > 32767
    pIntBuf2[i] = tmp;
  }

  // Check that the conversion went int16_t to float and back to int for every PCM value without any errors.
  assert( 0 == memcmp( pIntBuf1, pIntBuf2, sizeof(int16_t) * SZ) );

  delete pIntBuf1;
  delete pIntBuf2;
  delete pFloatBuf;
  return 0;
}
Qp answered 26/2, 2013 at 14:1 Comment(1)
Really glad I found this. Working with NAudio right now and was trying to figure out what the bloat was between AudioFileReader float[] and WaveFileReader byte[], this StackOverflow thread pointing me to what I needed.Tasia

© 2022 - 2024 — McMap. All rights reserved.