I am creating a Discord.js bot using Node.js that records the audio of users in a voice channel. It joins a channel and starts listening to each user separately. It records to a .pcm file (so only the raw data).
Now, this works, but the nature of Discord's audio stream is causing a problem. The audio stream obtained from Discord's API only sends data when the specific user is speaking, not when they are silent. This results in the moments a user speaks being pasted after each other, without the silence inbetween.
As an example, I speak for 5 seconds, then stop talking for 5 seconds, then start talking again, and so on. If I do this for 1 minute, I will get a file that is only 30 seconds long, since the 5 seconds of silence are not recorded in the stream.
The code looks something like this (receiver
is what the Discord API provides for a voice connection, the stream ends arbitrarily when I give a command):
const audioStream = receiver.createStream(user, {mode:'pcm', end:'manual'};
const outputStream = fs.createWriteStream('SOME_PATH');
audioStream.pipe(outputStream);
audioStream.on('end', () => {
console.log('Ended stream')
});
The audioStream
output is a 16-bit little-endian 44100 Hz stream (so only when the user is speaking).
Is there a way I can fill in the data gaps with silent frames of some sort? Or perhaps keep a stream of silence running and only put in data when it comes in?