WebSockets is protocol that relies on TCP streamed connection. Although WebSockets is Message based protocol.
If you want to implement your own protocol then I recommend to use latest and stable specification (for 18/04/12) RFC 6455.
This specification contains all necessary information regarding handshake and framing. As well most of description on scenarios of behaving from browser side as well as from server side.
It is highly recommended to follow what recommendations tells regarding server side during implementing of your code.
In few words, I would describe working with WebSockets like this:
- Create server Socket (System.Net.Sockets) bind it to specific port, and keep listening with asynchronous accepting of connections. Something like that:
Socket serverSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.IP);
serverSocket.Bind(new IPEndPoint(IPAddress.Any, 8080));
serverSocket.Listen(128);
serverSocket.BeginAccept(null, 0, OnAccept, null);
- You should have accepting function "OnAccept" that will implement handshake. In future it has to be in another thread if system is meant to handle huge amount of connections per second.
private void OnAccept(IAsyncResult result) {
try {
Socket client = null;
if (serverSocket != null && serverSocket.IsBound) {
client = serverSocket.EndAccept(result);
}
if (client != null) {
/* Handshaking and managing ClientSocket */
}
} catch(SocketException exception) {
} finally {
if (serverSocket != null && serverSocket.IsBound) {
serverSocket.BeginAccept(null, 0, OnAccept, null);
}
}
}
- After connection established, you have to do handshake. Based on specification 1.3 Opening Handshake, after connection established you will receive basic HTTP request with some information. Example:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
This example is based on version of protocol 13. Bear in mind that older versions have some differences but mostly latest versions are cross-compatible. Different browsers may send you some additional data. For example Browser and OS details, cache and others.
Based on provided handshake details, you have to generate answer lines, they are mostly same, but will contain Accept-Key, that is based on provided Sec-WebSocket-Key. In specification 1.3 it is described clearly how to generate response key.
Here is my function I've been using for V13:
static private string guid = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
private string AcceptKey(ref string key) {
string longKey = key + guid;
SHA1 sha1 = SHA1CryptoServiceProvider.Create();
byte[] hashBytes = sha1.ComputeHash(System.Text.Encoding.ASCII.GetBytes(longKey));
return Convert.ToBase64String(hashBytes);
}
Handshake answer looks like that:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
But accept key have to be the generated one based on provided key from client and method AcceptKey I provided before. As well, make sure after last character of accept key you put two new lines "\r\n\r\n".
After handshake answer is sent from server, client should trigger "onopen" function, that means you can send messages after.
Messages are not sent in raw format, but they have Data Framing. And from client to server as well implement masking for data based on provided 4 bytes in message header. Although from server to client you don't need to apply masking over data. Read section 5. Data Framing in specification.
Here is copy-paste from my own implementation. It is not ready-to-use code, and have to be modified, I am posting it just to give an idea and overall logic of read/write with WebSocket framing. Go to this link.
After framing is implemented, make sure that you receive data right way using sockets. For example to prevent that some messages get merged into one, because TCP is still stream based protocol. That means you have to read ONLY specific amount of bytes. Length of message is always based on header and provided data length details in header it self. So when you receiving data from Socket, first receive 2 bytes, get details from header based on Framing specification, then if mask provided another 4 bytes, and then length that might be 1, 4 or 8 bytes based on length of data. And after data it self. After you read it, apply demasking and your message data is ready to use.
You might want to use some Data Protocol, I recommend to use JSON due traffic economy and easy to use on client side in JavaScript. For server side you might want to check some of parsers. There is lots of them, google can be really helpful.
Implementing own WebSockets protocol definitely have some benefits and great experience you get as well as control over protocol it self. But you have to spend some time doing it, and make sure that implementation is highly reliable.
In same time you might have a look in ready to use solutions that google (again) have enough.