Implementing webbased real time video chat using HTML5 websockets

Asked 18/11, 2010 at 23:19 Answered 8/6, 2013 at 16:5

Solved python video html audio websocket

Does anyone know how to implement voice/video over IP in a webapplication using HTML5 websockets?

It would be nice if I could implement this with PHP or Python since I (unfortunately) don't know any other programming language at the moment.

A good tutorial will do, as well as an already-build-solution which I have to pay for.

Update1:
Added video because it's not only audio/voip related.

Update2:
First HTML5 Video Conference App is already created. See my own answer

Cham answered 18/11, 2010 at 23:19 Comment(4)

You going to release it under and open source license? I bet once you get it mostly working you could leverage a good amount of community help if you did. – Zubkoff 19/11, 2010 at 15:1

Yes, that's the idea :-) I have to do some research and planning first. After that I will update this question with further notifications. – Cham 19/11, 2010 at 15:13

getUserMedia() is the key here. – Sulky 10/8, 2013 at 23:15

A suggestion for Video: twilio.com/docs/api/video and for voip: twilio.com/docs/api/client – Palmar 13/9, 2016 at 22:52

If you want to go with HTML5 only, you will need a browser implementing the HTML Media Capture draft (available here) in order to access the raw data from the microphone.

Once you have this data in hand, you need to send it over the network. Websockets would be the HTML5 option to have fast enough round trips with the server (sending local audio data and receiving remote audio data at the same time)

Since you mention python, I would recommend looking around the twisted implementation of websockets.

You can have all your clients "register" on the websocket server with a callerID, so the server knows where to find a given callerID.

Then your server will need an "invite" API where caller1 "invites" caller2.

Once the call is setup and each client starts sending its audio data, the server will be able to send this audio data to the other party.

Upon receiving audio data, the browser will need to play this audio data on the speakers, probably using the HTML5 audiotag.

To do this, you may be forced to use a "trick" : instead of having the websocket server forward the raw audio data to the client, you may need to simulate 2 "infinite" files :

caller1.wav : sound captured on caller1 mic
caller2.wav : sound captured on caller2 mic

caller1 browser would add caller2.wav in the audio.src attribute once the call is setup (caller1 would be informed of this event via websocket) and hopefully if the python server appends the raw audio data to the caller2.wav as it receives it, it would start playing.

This sounds like a cool prototype you're going to hack up !

Good luck on your journey,

Jerome Wagner

Fra answered 18/11, 2010 at 23:49 Comment(5)

Thanks a lot! Seems there's also a HTML5 videotag, so I could also add streaming video. One question about: you may be forced to use a "trick". Could you explain why? – Cham 19/11, 2010 at 5:24

the "trick" because 1/ I am not sure that there is an API to play raw audio data so you will need to use the audio tag 2/ I think you will get poor quality if you create 1 audio tag per chunk of raw data so you need to trick the browser into thinking that there is only 1 audio file. Keep me informed if you get something working ! – Fra 19/11, 2010 at 8:23

Hey Jerome, thank you for your awesome contributions :-) Really appreciated! I am planning to make it an open source project. First I gonna start with the planning and usual stuff. If you are interested in it's progress, please visit this question once in a while.. I'll post updates here. – Cham 19/11, 2010 at 15:10

hi, I am also looking into this, did you have much luck getting it to work? – Chinchin 30/3, 2012 at 17:20

@JeromeWAGNER, To improve latency, How can we implement it such that the video gets streamed directly from Client-A to Client-B instead of having the server as an intermediary? – Tillio 31/1, 2015 at 8:10

Seems like Ericsson created the first HTML5 Video Conference App.

The technique they used:

Implemented the device element and the Stream API (device element GUI is currently written in JavaScript/CSS)
Added MediaStreamManager to map Stream URLs to the corresponding pipeline in the media backend
Added MediaStreamTransceiver to control the related media processing and transport
Added support for binary data in the WebSocket protocol

See: labs.ericsson.com:

Video on YouTube: Beyond HTML5: Conversational Voice and Video demo | Ericsson Labs

Unfortunately Ericsson doesn't want to share device_dialog.js (yet).

Cham answered 21/11, 2010 at 6:16 Comment(2)

The problem is that the ericsson demo only works with a patched webkit that they "plan" to release. You may as well do this with flash and still call it html5 ;-) – Fra 22/11, 2010 at 23:22

Very true, and that's why we are going to build it ourselves, our way :-) – Cham 22/11, 2010 at 23:38

WebRTC might be an answer: http://www.webrtc.org/running-the-demos (currently only Chrome Canary with MediaStream flag enabled)

See demo: https://apprtc.appspot.com (make sure you watch in a proper browser) and code http://code.google.com/p/webrtc-samples/source/browse/trunk/apprtc/

The reason I'm writing is... I got really cheap Android tablet and cannot intall Skype nor Vtok nor Google Voice is available outside the US. I need to find HTML5 based solution as I'm able to run Opera Mobile 12 and got http://html5demos.com/ working properly

Redfin answered 3/5, 2012 at 8:54 Comment(0)

@work/gotta be quick

Check out the javaScript getUserMedia(CanIUse) - API (W3)

Dyeing answered 19/9, 2012 at 22:17 Comment(0)

webrtc is the answer now.

for node.js stack - you can look at http://www.easyrtc.com/ . Note that IE has not yet built support for the APIs that make webrtc work.

Aixlachapelle answered 8/6, 2013 at 16:5 Comment(0)

The technique they used:

Recommended topics

Hot tags