Is it really possible for webRTC to stream high quality audio without noise?
Asked Answered
F

2

11

I have tested with the highest quality settings and multiple STUN/TURN servers with no luck in finding a real high quality stream.

In my experience webRTC always has a fluctuating and limited bandwidth and a high level of background noise that doesn't reach the quality of mp3/Shoutcast/Icecast radio streams.

Has anyone found a way to provide a real high bandwidth audio stream with webRTC or is it not actually possible at this time?

Flask answered 5/9, 2017 at 21:8 Comment(1)
Yes. STUN has nothing to do with quality, though.Incendiary
M
12

Firstly, its worth saying that Web RTC builds on the underlying network connectivity and if it is poor then there is very little any higher layers can do to avoid this.

Looking at the particular comparison you have highlighted, there are a couple of factors which are key to VoIP voice quality (assuming you are focused on voice from the question):

  • Latency: to avoid delay and echo, voice communication needs a low end to end latency. The target for good quality VoIP systems is usually sub 200 ms latency.
  • Jitter - this is essentially the variance in the latency one time, i.e. how the end to end delay varies over time.
  • Packet loss - voice is actually reasonably tolerant to packet loss compared to data. VoIp targets are typically in the 1% or less range.

Comparing this with steamed radio etc, the key point is the latency - it is not unusual to wait several seconds for a stream to start playing back.

This allows the receiver to fill a much bigger buffer of packets waiting to be decoded and played back, and makes it much more tolerant of variations in the latency (jitter).

Taking a simple example, if you had a brief half second interruption in your connection, this would immediately impact a two way VoIP call, but it might not impact streamed audio at all, assuming the network recovers fully and the buffer had several seconds worth of content in it at the time.

So the quality difference you are seeing compared to streamed audio are most likely related to the real tine nature of the communication, rather than with inherent WebRTC faults - or maybe more precisely, even if WebRTC was perfect, real time two way VoIP is very susceptible to network conditions.

As. a note, video cleary needs much more bandwidth, and is also impacted by the network but people tend to be more tolerant of video 'stutters' than voice quality issues in multimedia calls (at this time amyay).

Machmeter answered 7/9, 2017 at 19:42 Comment(0)
C
32

The default audio settings for WebRTC are pretty low. It defaults to mono audio around 42 kb/s as it seems to be designed for voice. I increased the quality by configuring a few settings.

  1. Disable autoGainControl, echoCancellation and noiseSuppression in the getUserMedia() constraints:
navigator.mediaDevices.getUserMedia({
  audio: {
    autoGainControl: false,
    channelCount: 2,
    echoCancellation: false,
    latency: 0,
    noiseSuppression: false,
    sampleRate: 48000,
    sampleSize: 16,
    volume: 1.0
  }
});
  1. Add the stereo and maxaveragebitrate attributes to the SDP:
let answer = await peer.conn.createAnswer(offerOptions);
answer.sdp = answer.sdp.replace('useinbandfec=1', 'useinbandfec=1; stereo=1; maxaveragebitrate=510000');
await peer.conn.setLocalDescription(answer);

This gives a potential maximum bitrate of 520kbps for stereo, which is 260kbps per channel!

Actual bitrate depends on the speed of your network and strength of your signal.

More information about the SDP:

The Session Description Protocol (SDP) [RFC4566] describes various aspects of multimedia session such as media capabilities, transport addresses and related metadata in a transport agnostic manner, for the purposes of session announcement, session invitation and parameter negotiation.

https://tools.ietf.org/id/draft-nandakumar-rtcweb-sdp-01.html#rfc.section.3

Check out my project which implements these features: https://github.com/kmturley/webrtc-radio

Camilacamile answered 17/11, 2019 at 6:43 Comment(6)
This is the answer I came looking for as I was brainstorming setting up some PiSounds for geographically local music jamming over ethernet. Similar to JamKazam but all open source and inexpensive hardware. My first thought was WebRTC and wanted to know if it could actually do HD audio. Definitely going to test your code soon as it looks promising!Kinnon
This answer can be improved by linking to relevant sections of the WebRTC RFC specification.Kinnon
I think SDP means Session Description Protocol and that this answer could elaborate a bit on what this is and I think this means on the WebRTC STUN server.Kinnon
Thanks for the feedback, have added a description of the SDP. I had a similar idea but for a localhost radio. You might want to look at my project as I already had to overcome some hurdles! github.com/kmturley/webrtc-radioCamilacamile
Nice, thanks, and I'll check out your project soon when I start going down this path.Kinnon
Thanks! Works for my setup perfectly!Stealth
M
12

Firstly, its worth saying that Web RTC builds on the underlying network connectivity and if it is poor then there is very little any higher layers can do to avoid this.

Looking at the particular comparison you have highlighted, there are a couple of factors which are key to VoIP voice quality (assuming you are focused on voice from the question):

  • Latency: to avoid delay and echo, voice communication needs a low end to end latency. The target for good quality VoIP systems is usually sub 200 ms latency.
  • Jitter - this is essentially the variance in the latency one time, i.e. how the end to end delay varies over time.
  • Packet loss - voice is actually reasonably tolerant to packet loss compared to data. VoIp targets are typically in the 1% or less range.

Comparing this with steamed radio etc, the key point is the latency - it is not unusual to wait several seconds for a stream to start playing back.

This allows the receiver to fill a much bigger buffer of packets waiting to be decoded and played back, and makes it much more tolerant of variations in the latency (jitter).

Taking a simple example, if you had a brief half second interruption in your connection, this would immediately impact a two way VoIP call, but it might not impact streamed audio at all, assuming the network recovers fully and the buffer had several seconds worth of content in it at the time.

So the quality difference you are seeing compared to streamed audio are most likely related to the real tine nature of the communication, rather than with inherent WebRTC faults - or maybe more precisely, even if WebRTC was perfect, real time two way VoIP is very susceptible to network conditions.

As. a note, video cleary needs much more bandwidth, and is also impacted by the network but people tend to be more tolerant of video 'stutters' than voice quality issues in multimedia calls (at this time amyay).

Machmeter answered 7/9, 2017 at 19:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.