Firstly, its worth saying that Web RTC builds on the underlying network connectivity and if it is poor then there is very little any higher layers can do to avoid this.
Looking at the particular comparison you have highlighted, there are a couple of factors which are key to VoIP voice quality (assuming you are focused on voice from the question):
- Latency: to avoid delay and echo, voice communication needs a low end to end latency. The target for good quality VoIP systems is usually sub 200 ms latency.
- Jitter - this is essentially the variance in the latency one time, i.e. how the end to end delay varies over time.
- Packet loss - voice is actually reasonably tolerant to packet loss compared to data. VoIp targets are typically in the 1% or less range.
Comparing this with steamed radio etc, the key point is the latency - it is not unusual to wait several seconds for a stream to start playing back.
This allows the receiver to fill a much bigger buffer of packets waiting to be decoded and played back, and makes it much more tolerant of variations in the latency (jitter).
Taking a simple example, if you had a brief half second interruption in your connection, this would immediately impact a two way VoIP call, but it might not impact streamed audio at all, assuming the network recovers fully and the buffer had several seconds worth of content in it at the time.
So the quality difference you are seeing compared to streamed audio are most likely related to the real tine nature of the communication, rather than with inherent WebRTC faults - or maybe more precisely, even if WebRTC was perfect, real time two way VoIP is very susceptible to network conditions.
As. a note, video cleary needs much more bandwidth, and is also impacted by the network but people tend to be more tolerant of video 'stutters' than voice quality issues in multimedia calls (at this time amyay).