Why websockets stop broadcasting after some time? ( implementation uses ReactPHP, Ratchet & ZeroMQ )
Asked Answered
A

1

8

I have a small websocket server, running on top of a set of libraries:

  • ReactPHP,
  • Ratchet

and

  • ZeroMQ, using a php-zmq wrapper.

The code is basically the same as in the tutorials.

The eventloop starts correctly, users are able to connect to the server, they are getting correct messages, when the other side pushes something, but after a while, usually a few days (depending on the usage) the messages stop arriving.

The usage is not overwhelming at all - only one or two frontend developers connect at the moment, as this is a development stage.

The loop is running, it returns HTTP 101 Switching protocols on connect correctly, but does not broadcast messages that were correctly broadcast before. No errors anywhere. Restarting the event loop helps.

My questions are:

1) What can cause this? Has someone encountered similar behaviour?

2) Can you recommend a way I could debug this in long running process of the event loop?

Currently, I must stop the loop, change the code (add logging calls), restart the loop again and wait for it to go wrong again, which is tedious at least.

Any help greatly appreciated.

Atal answered 8/2, 2017 at 17:21 Comment(4)
Welcome to the wild worlds of distributed computing. There can be many reasons for a lost / blocked functionality. Can you thus post an MCVE, as a minimum for the pair of ( event-loop + client ) and the full details on ZeroMQ infrastructure settings + exception handling ( parameters + both transport classes & Formal Communication Patterns / behaviour archetypes ) used for this Project?Temuco
thinking out of box, looks like some resource pool is exhausted, i.e.: memory so new object can not be allocated, or network connections amount so new connection can not be established, or max number of php workes which are running (maybe they are sleeping and waiting for input). Consider making all sort of suggestions like that and try to eliminate them one by oneJordans
Without showing us any code, your best bet is to check any and all error logs, and if your logging isn't aggressive enough you'll have to fix that first.Shoreward
are you running through any type of proxy?Melisma
A
1

Well, I guess the ZMQ was the culprit.

When there were multiple applications using ZMQ on the same machine, messages sometimes ended up in the wrong consumer - even though every application had a different port specified for connection to ZMQ sockets.

So users were sometimes getting websocket frames from a completely different application, and when there was no corresponding user for the message, the frame vanished on the way. So websockets didn't stop broadcasting, messages were just routed incorrectly.

I have no larger knowledge of ZMQ and whether this is a documented or otherwise known behaviour.

I solved the problem by rewriting the backend to RabbitMQ with a separate vhost and channel for every application. The problems are gone now, every frame ends up where it should.

Atal answered 21/3, 2017 at 13:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.