Scaling WebSockets with a Message Queue

Asked 6/9, 2014 at 14:5 Answered 14/4, 2021 at 12:10

Solved sockets websocket rabbitmq zeromq publish-subscribe

I have built a WebSockets server that acts as a chat message router (i.e. receiving messages from clients and pushing them to other clients according to a client ID).

It is a requirement that the service be able to scale to handle many millions of concurrent open socket connections, and I wish to be able to horizontally scale the server.

The architecture I have had in mind is to put the websocket server nodes behind a load balancer, which will create a problem because clients connected to different nodes won't know about each other. While both clients A and B enter via the LoadBalancer, client A might have an open connection with node 1 while client B is connected to node 2 - each node holds it's own dictionary of open socket connections.

To solve this problem, I was thinking of using some MQ system like ZeroMQ or RabbitMQ. All of the websocket server nodes will be subscribers of the MQ server, and when a node gets a request to route a message to a client which is not in the local connections dictionary, it will pub-lish a message to the MQ server, which will tell all the sub-scriber nodes to look for this client and issue the message if it's connected to that node.

Q1: Does this architecture make sense?

Q2: Is the pub-sub pattern described here really what I am looking for?

Haifa answered 6/9, 2014 at 14:5 Comment(4)

are you building a chat? do you also need to redirect the messages to mobile device? – Exeter 8/9, 2014 at 8:25

yes (the system is chat like and it should work on Chrome for Android) – Haifa 8/9, 2014 at 11:14

It is just OT, but have you considered to use XMPP? – Exeter 8/9, 2014 at 12:46

@Gas I want to be able to connect directly from my html client to the service using WebSockets (clientA -> Load Balancer -> WS_nodeA -> MQ -> WS_nodeB -> clientB) so XMPP is not an option as far as I can gather – Haifa 8/9, 2014 at 15:32

ZeroMQ would be my option - both architecture-wise & performance-wise

-- fast & low latency ( can measure your implementation performance & overheads, down to sub [usec] scale )

-- broker-less ( does not introduce another point-of-failure, while itself can have { N+1 | N+M } self-healing architecture )

-- smart Formal Communication Pattern primitives ready to be used ( PUB / SUB is the least cardinal one )

-- fair-queue & load balancing architectures built-in ( invisible for external observer )

-- many transport Classes for server-side internal multi-process / multi-threading distributed / parallel processing

-- ready to almost linear scaleability

Adaptive node re-discovery

This is a bit more complex subject. Your intention to create a feasible architecture will have to drill down into more details to solve.

Node authentication vs. peer-to-peer messaging
Node (re)-discovery vs. legal & privacy issues
Node based autonomous self-organising Agents vs. needs for central policy enforcement

Kenney answered 6/9, 2014 at 16:11 Comment(0)

To update this for 2021, we just solved this problem where we needed to design a system that could handle millions of simultaneous WS connections from IoT devices. The WS server just relays messages to our Serverless API backend that handles the actual logic. We chose to use docker and the node ws package using an auto-scaling AWS ECS Fargate cluster with an ALB in front of it.

This solved the main problem of routing messages, but then we had the same issue of how do we route response messages from the server. We initially thought of just keeping a central DB of connections, but routing messages to a specific Fargate instance behind an ALB didn't seem feasible.

Instead, we set up a simple sub/pub pattern using AWS SNS (https://aws.amazon.com/pub-sub-messaging/). Every WS server receives the response and then searches its own WS connections. Since each Fargate instance handles just routing (no logic), they can handle a lot of connections when we vertically scale them.

Update: To make this even more performant, you can use a persistent connection like Redis Pub/Sub to allow the response message to only go to one single server instead of every server.

Jasonjasper answered 14/4, 2021 at 12:10 Comment(3)

I have a doubt. In the redis pub/sub mechanism how will you decide which server to send the message to? In the SNS, we were sending the message to all the servers because we didn't know which server the client is connected to... – Humber 9/9, 2022 at 2:32

We set it up with each device being a topic, and then the device subscribed to that topic. If I remember correctly, Redis Pub/Sub uses WebSockets behind the scenes, so it has a persistent connection. Since we are using it with IoT devices that can scale quickly, we monitor the maximum connection amount (redis.io/docs/reference/clients). If you were dealing with millions of potential connections, it might be better just to publish a message to every server, but our initial thoughts were to reduce the extra noise by doing that. – Jasonjasper 9/9, 2022 at 12:51

As another update, we recently migrated to Kafka for this action to push messages to all servers again. It's a little noisier, but it eliminates technical complexity around needing a persistent connection. – Jasonjasper 16/3, 2023 at 18:51

ZeroMQ would be my option - both architecture-wise & performance-wise

Adaptive node re-discovery

Recommended topics

Hot tags