convert audio file to Linear PCM 16-bit
Asked Answered
A

1

5

I am trying to send an audio file through a websocket, and I realised that in order to do so i need to convert the mp3 file to a Linear PCM 16-bit code, but i cant find a way to do so.

here is what i want to do:

 let mp3File = // the 16-bit pcm file 

    ws.on('message', async(msg) => {
        if (typeof msg === "string") {

        } else if (recognizeStream) {
            recognizeStream.write(msg);
        }
        ws.send(mp3File) <== stream back the audio file
    });
});

some background, the stream is a phone call (via vonage api) so ny ws connected to phone call and hear the user input, and then after some logic on my server i want to play to the user a mp3 file that is a local file in my server, via ws.send().

-----------update--------

now, if i send the pcm data from the stream (the raw audio from phone call) its works (the server echoing the phone call ) so i want to convert the mp3 file to the same format so i could send it to via ws.send().

-----------update 2--------

after making my audio file at the right format which is: " Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a 20ms frame size "

i am trying to send the file trough the web socket but i dont know how to do so, i have the file in the project folder but i dont know how to send it via websocket , i looked for how to do so but i dident find anything.

i am trying to do what specified here: enter image description here

Adolescence answered 31/3, 2021 at 10:26 Comment(18)
when you use its binary format websockets simply transfers bytes so no need to first convert to PCM ... just read the doc on websocket binary format ... WAV has a 44 byte header which defines attributes like bit depth and sample rate followed by the WAV payload which is the raw audio in PCM formatPaget
How can i do this? And by the way, if i need the audio format to be lpcm 16 bit it means its a wav file right?Adolescence
Thank you i willAdolescence
i dont understand how to convert my mp3 file to binary format, can you tell me how can i do this?Adolescence
mp3 is already a binary file format so just open up the mp3 file as a binary ( not text ) and read its bytes ... that is OK if your websocket message will send the entire mp3 file in one message ... as far as I know mp3 is not a streaming format ( I could be wrong ) so over on the receiving side if you expect to render the audio of the websocket transmitted audio as you continue to transfer successive websocket messages this will not work ... to stream audio you need a streaming audio codec on the sender sidePaget
update your question to detail whether or not your source input audio is streaming or not ... you can use mp3 if each mp3 file will get transferred in its entirety before getting rendered into audio on the receiving sidePaget
the other side is vonage api, and it looks like i need my audio file to be in a specific format, here is a link to the api ws docs : developer.nexmo.com/voice/voice-api/guides/… ---> and you can see the requirments under Binary audio messages i dont really undestand what to do but maby i need to send a wav file the is 16 bit lpcm in 16khzAdolescence
You may want to read this #54913602Limbert
thanks, this is exactly what i want to do, but there is no working solution thereAdolescence
I think you have two fundamental problems to unpack, one is how to send data to nexmo, the second is how convert mp3 to PCM in JavaScript. If you approach these separately you will find solutions already exist. Your job is to join these things together.Limbert
For instance, have you got some test pcm data you have successfully uploaded? If not, I’d recommend doing that firstLimbert
basicly, if i use the pcm data that i get from the stream (the audio from the stream) and send it back (echoing the phone call) it works, so i just want to understand how can i convert my mp3 fie to be the same format so i can send it instedAdolescence
Great! I think that might be worth stating in the question, as really at that point it doesn’t have too much to do with nexmo, just converting mp3 to pcm. On thing to bear in mind is the container for pcm is .wav , so some may use wav and pcm interchangeablyLimbert
This questions suggests a package already exists for mp3 to wav conversion #53902496Limbert
thank, by the way, according to nexmo docs, i need to send the audio as massages while each one is have a sample rate of 16khz and a 20ms frame size, how can i do that?Adolescence
@Adolescence don’t forget to @ or those in your comments won’t be notified. Your last point again opens a separate set of questions. I’d look at how to change sample rate and bit depth of a wav file outside of your problem with nexmoLimbert
@Adolescence - Playing the audio back over the websocket is certainly a valid way to handle this. But I think maybe for your use case, you would be better off playing the file using the stream API: developer.nexmo.com/api/voice?theme=dark#startStream - This would save you the nuisance of having to do the file conversion, since the file sounds like a static resource, you would simply have to create a route to get to it, and send the URI to that route to the stream endpoint.Discordant
@Discordant - hi, thanks, i did it but the problem is, that if i doing it this way, i have 4 sec dilay between the moment, the client stop talking to the moment he hears the record, and i'm trying to reduce that time, to be at least 2 secconds or less, that is why i want to send the audio via the websocket.Adolescence
O
6

First let's understand what this means:

Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a 20ms frame size

They are talking about 2 things here:

  1. The format of audio data, which is "Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate"
  2. How you send this audio data to them and how they send it to you: in chunks of audio data worth 20ms frames

Based on the audio format, if you choose "16bit Linear PCM with sample rate of 16K" implies:

  • samplerate = 16000
  • samplewidth = 16 bits = 2 byte

So an audio chunk of 1 second will contain bytes = (16000 * 2) = 32000 bytes this means a 20ms/0.02s frame of audio will be equivalent to (32000*0.2) = 640 bytes

There are 2 things needed:

  1. convert mp3 to wav. Install ffmpeg on your system and run this command
    ffmpeg -i filename.mp3 -ar 16000 -sample_fmt s16 output.wav
    This converts your filename.mp3 to output.wav which will be Linear PCM 16-bit in 16K samplerate

  2. In your code, when you send audio back, you need to stream it as chunks of 640 bytes, not the entire file data in one shot. There are 3 options:

    • run a loop to write write all the audio to the websocket but in chunks of 640 bytes. but this has an issue, Nexmo will buffer only first 20s of audio. Anything more than that will be discarded
    • start an async task that runs every 20ms and writes 640 bytes of data to websocket.
    • write when you get audio from nexmo (this is the one I will show) Since nexmo will send you 640 bytes every 20ms, you can just send back 640 bytes at same time.

I'm writing this example using npm websocket package.

var fs = require('fs');
var binaryData = fs.readFileSync('output.wav');
var start = 44 // discard the wav header
var chunkSize = 640

...

// ws is a websocket connection object
connection.on('message', function(message) {
  if (message.type === 'utf8') {
    // handle a text message here
  }
  else if (message.type === 'binary') {
    // print length  of audio sent by nexmo. will be 640 for 16K and 320 for 8K 
    console.log('Received Binary Message of ' + message.binaryData.length + ' bytes');

    if (start >= binaryData.length) {
      // slice a chunk and send
      toSend = binaryData.slice(start, start + chunkSize)
      start = start + chunkSize
      connection.sendBytes(toSend); 
      console.log('Sent Binary Message of ' + toSend.length + ' bytes');
    } 
  } ...
  
});

Remember, there will be some delay from the point you send the audio from your server to nexmo, and you hearing on other side. It can vary from half a second to even more depending on the location of Nexmo's datacentre, of the server where you run your code, network speed etc. I have observed it to be close to 0.5 sec.

Overrun answered 28/4, 2021 at 4:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.