Downsample PCM audio from 44100 to 8000
Asked Answered
S

3

8

I've been working on a audio-recognize demo for some time, and the api needs me to pass an .wav file with sample rate of 8000 or 16000, so I have to downsample it. I have tried 2 algorithms as following. Though none of them solves the problem as I wish, there's some differences of the results and I hope that will make it more clear.

This is my first try, and it works fine when sampleRate % outputSampleRate = 0, however when outputSampleRate = 8000 or 1600, the outcome audio file is silent(which means the value of every element of the output array is 0):

function interleave(inputL){
  var compression = sampleRate / outputSampleRate;
  var length = inputL.length / compression;
  var result = new Float32Array(length);

  var index = 0,
  inputIndex = 0;

  while (index < length){
    result[index++] = inputL[inputIndex];
    inputIndex += compression;
  }
  return result;
}

So here's my second try which comes from a giant company, and it doesn't work too. What's more, when I set sampleRate % outputSampleRate = 0 it still output a silent file:

function interleave(e){
  var t = e.length;
  var n = new Float32Array(t),
    r = 0,
    i;
  for (i = 0; i < e.length; i++){
    n[r] = e[i];
    r += e[i].length;
  }
  sampleRate += 0.0;
  outputSampleRate += 0.0;
  var s = 0,
  o = sampleRate / outputSampleRate,
  u = Math.ceil(t * outputSampleRate / sampleRate),
  a = new Float32Array(u);
  for (i = 0; i < u; i++) {
    a[i] = n[Math.floor(s)];
    s += o;
  }

  return a
}

In case my settings were wrong, here's the encodeWAV function:

function encodeWAV(samples){
  var sampleBits = 16;
  var dataLength = samples.length*(sampleBits/8);

  var buffer = new ArrayBuffer(44 + dataLength);
  var view = new DataView(buffer);

  var offset = 0;

  /* RIFF identifier */
  writeString(view, offset, 'RIFF'); offset += 4;
  /* file length */
  view.setUint32(offset, 32 + dataLength, true); offset += 4;
  /* RIFF type */
  writeString(view, offset, 'WAVE'); offset += 4;
  /* format chunk identifier */
  writeString(view, offset, 'fmt '); offset += 4;
  /* format chunk length */
  view.setUint32(offset, 16, true); offset += 4;
  /* sample format (raw) */
  view.setUint16(offset, 1, true); offset += 2;
  /* channel count */
  view.setUint16(offset, outputChannels, true); offset += 2;
  /* sample rate */
  view.setUint32(offset, outputSampleRate, true); offset += 4;
  /* byte rate (sample rate * block align) */
  view.setUint32(offset, outputSampleRate*outputChannels*(sampleBits/8), true); offset += 4;
  /* block align (channel count * bytes per sample) */
  view.setUint16(offset, outputChannels*(sampleBits/8), true); offset += 2;
  /* bits per sample */
  view.setUint16(offset, sampleBits, true); offset += 2;
  /* data chunk identifier */
  writeString(view, offset, 'data'); offset += 4;
  /* data chunk length */
  view.setUint32(offset, dataLength, true); offset += 4;

  floatTo16BitPCM(view, offset, samples);

  return view;
}

It has confused me for a very long time, please let me know what I missed...

-----------------------------AFTER IT'S SOLVED--------------------------------

I'm glad it's running well now and here's the right edition of function interleave():

    function interleave(e){
      var t = e.length;
      sampleRate += 0.0;
      outputSampleRate += 0.0;
      var s = 0,
      o = sampleRate / outputSampleRate,
      u = Math.ceil(t * outputSampleRate / sampleRate),
      a = new Float32Array(u);
      for (i = 0; i < u; i++) {
        a[i] = e[Math.floor(s)];
        s += o;
      }

      return a;
    }

So you can see it's the variable that I passed to it was not of the proper type~ And thanks again for dear @jaket and other friends~ Though I figured it out myslf someway, they let me know the original things better~~~ :)

Sadi answered 4/8, 2015 at 19:54 Comment(2)
When you say silent, you assume knowledge about the audio encoding and playback domain, which defines what characteristics make an audio file silent to the ears or to the speakers. It would be probably better to concentrate on the mathematical properties of the input and output sequences (can you describe them?), to peel away one layer of abstraction.Nacelle
@Nacelle Hi! thanks for your answering, "silnent" means the value of every element in the array is 0 and I will add this to my description asap~Sadi
U
14

There is a lot more to sample rate conversion than just simply throwing samples away or inserting them.

Lets take a simple case of downsampling by a factor of 2. (e.g. 44100->22050). A naive approach would be to just throw away every other sample. But imagine for a second that in the original 44.1kHz file there was a single sine wave present at 20khz. It is well within nyquist (fs/2=22050) for that sample rate. After you throw every other sample away it is still going to be there at 10kHz but now it will be above nyquist (fs/2=11025) and it will alias into your output signal. The final result is that you will have a big fat sine wave sitting at 8975 Hz!

In order to avoid this aliasing during downsampling you need to first design a lowpass filter with a cutoff selected according to your decimation ratio. For the example above you would cutoff everything above 11025 first and then decimate.

The flip side of the coin is called upsampling and interpolation. Say you want to increase the sample rate by a factor of 2. First you insert zeros between every input sample and then run an interpolation filter to compute values to replace the zeros using the surrounding samples.

Rate changing usually involves some combination of decimation and interpolation - since both work by an integral numbers of samples. Take 48000->32000 as an example. The output/input ratio is 32000/48000 or 2/3. So you'd upsample 48000 by 2 to get 96000 and then downsample that by 3 to 32000. Another thing is that you can chain these processes together. So if you want to go from 48000->16000 you'd go up 3, down 2, down 2. Also, 44100 is particularly difficult. For example to move from 48000->44100 you need to go up 147, down 160 and you can't break it down to smaller terms.

I'd suggest you find some code or a library to do this for you. What you need to look for is a polyphase filter or sample rate converter.

United answered 4/8, 2015 at 23:49 Comment(2)
Thanks for explaining~ that helps a lot~~~O(∩_∩)O~Sadi
#53810558 I tried but not working I have also attached the file tooNell
O
0

The problem is that you are trying to access an array using a floating point number. When you access inputL[5.5125] it's the same as input['5.5125'], i.e. you will try to read a property named 5.5125 from the array object, not an item from the array data.

Round the number so that you get the closest integer index:

function interleave(inputL){
  var compression = sampleRate / outputSampleRate;
  var length = inputL.length / compression;
  var result = new Float32Array(length);

  var index = 0,
  inputIndex = 0;

  while (index < length){
    result[index++] = inputL[Math.round(inputIndex)];
    inputIndex += compression;
  }
  return result;
}
Ogpu answered 5/8, 2015 at 0:1 Comment(0)
H
0

what @jacket said is true, you cannot just down-sample the audio by just reducing the no. of items in the array, two ways I can think of doing it is:

  1. if you are not particular about wav which is uncompressed format and gonna drain your bandwidth, you can try this small utility I wrote for recording as mp3 file, just modify the line in scripts/recorder.js

     config: {
        sampleRate: this.context.sampleRate
      }
    

    to

      config: {
        sampleRate: 16000 // or any other sampling rate
      }
    
  2. Another option is, if you are already doing some sort of audio processing back-end, and do not mind adding ffmpeg to the stack, you can either send the wav file(uncompressed format) / ogg file( compressed format, code) to the server, over there you can change it to whatever format you prefer with whatever sample rate you desire using ffmpeg before doing rest of the processing.

Hype answered 5/8, 2015 at 2:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.