How to protect ZeroMQ Request Reply pattern against potential drops of messages?
Asked Answered
H

1

7

I'm trying to implement a ZeroMQ pattern on the TCP layer between a c# application and distributed python servers. I've gotten a version working with the request-reply REQ/REP pattern and it seems relatively stable when testing on localhost. However, in testing, I've debugged a few situations, where I accidently send multiple requests before receiving a reply which apparently is not acceptable.

In practice the network will likely have lots of dropped packets and I suspect that I'll be dropping lots of replies and/or unable to send requests.

1) Is there a way to reset the connection between REQ/REP request-reply sockets?
Would a REOUTER/DEALER pattern instead make more sense? As this is my first application with ZeroMQ, I was hoping to keep it simple.

2) Is there a good ZeroMQ mechanism for handling the connectivity events? I've been reading "the guide" and there are a few mentions of monitoring connections, but no examples. I found the ZMonitor, but can't get the events to trigger in c#.

Heterochromous answered 4/11, 2016 at 13:24 Comment(3)
Why not use a poller on the REQ side and wait for the response for a limited time. If there is none you should close/destroy the socket, create a new one and try again.Murguia
The guide said that creating and destroying sockets multiple times was bad practice. Apparently there's no guarantee that the underlying socket will close until the context is terminated... Which shouldn't happen until I'm disposing the owner.Heterochromous
That depends on how often sockets are closed and reopened. The guide suggests closing and reopening sockets in the "Client-Side Reliability (Lazy Pirate Pattern)" section (zguide.zeromq.org/…)Murguia
R
8

Ad 1) No,
there is not any socket link-management interface exposed to user to test/reset the state of the FSA-to-FSA link in ZeroMQ framework.

Yes, XREQ/XREP may help you overcome the deadlocks, that may & do happen in REQ/REP Scaleable Formal Communication Pattern:

Ref.: REQ/REP Deadlocks >>> https://mcmap.net/q/590377/-quot-server-quot-to-quot-server-quot-zeromq-communication

Fig.1: Why it is wrong to use a naive REQ/REP
all cases when [1]in_WaitToRecvSTATE_W2R + [2]in_WaitToRecvSTATE_W2R
are principally unsalvageable mutual deadlock of REQ-FSA/REP-FSA Finite-State-Automata and will never reach the "next" in_WaitToSendSTATE_W2S internal state.

               XTRN_RISK_OF_FSA_DEADLOCKED ~ {  NETWORK_LoS
                                         :   || NETWORK_LoM
                                         :   || SIG_KILL( App2 )
                                         :   || ...
                                         :      }
                                         :
[App1]      ![ZeroMQ]                    :    [ZeroMQ]              ![App2] 
code-control! code-control               :    [code-control         ! code-control
+===========!=======================+    :    +=====================!===========+
|           ! ZMQ                   |    :    |              ZMQ    !           |
|           ! REQ-FSA               |    :    |              REP-FSA!           |
|           !+------+BUF> .connect()|    v    |.bind()  +BUF>------+!           |
|           !|W2S   |___|>tcp:>---------[*]-----(tcp:)--|___|W2R   |!           |
|     .send()>-o--->|___|           |         |         |___|-o---->.recv()     |
| ___/      !| ^  | |___|           |         |         |___| ^  | |!      \___ |
| REQ       !| |  v |___|           |         |         |___| |  v |!       REP |
| \___.recv()<----o-|___|           |         |         |___|<---o-<.send()___/ |
|           !|   W2R|___|           |         |         |___|   W2S|!           |
|           !+------<BUF+           |         |         <BUF+------+!           |
|           !                       |         |                     !           |
|           ! ZMQ                   |         |   ZMQ               !           |
|           ! REQ-FSA               |         |   REP-FSA           !           |
~~~~~~~~~~~~~ DEADLOCKED in W2R ~~~~~~~~ * ~~~~~~ DEADLOCKED in W2R ~~~~~~~~~~~~~
|           ! /\/\/\/\/\/\/\/\/\/\/\|         |/\/\/\/\/\/\/\/\/\/\/!           |
|           ! \/\/\/\/\/\/\/\/\/\/\/|         |\/\/\/\/\/\/\/\/\/\/\!           |
+===========!=======================+         +=====================!===========+

Fig.2: One may implement a free-stepping transmission layer using several pure ZeroMQ builtins and add some SIG-layer tools for getting a full control of all possible distributed system states.

App1.PULL.recv( ZMQ.NOBLOCK ) and App1.PULL.poll( 0 ) are obvious

[App1]      ![ZeroMQ]
code-control! code-control           
+===========!=======================+
|           !                       |
|           !+----------+           |         
|     .poll()|   W2R ___|.bind()    |         
| ____.recv()<----o-|___|-(tcp:)--------O     
| PULL      !|      |___|           |   :   
|           !|      |___|           |   :   
|           !|      |___|           |   :   
|           !+------<BUF+           |   :     
|           !                       |   :                           ![App2]
|           !                       |   :     [ZeroMQ]              ! code-control
|           !                       |   :     [code-control         ! once gets started ...
|           !                       |   :     +=====================!===========+
|           !                       |   :     |                     !           |
|           !                       |   :     |         +----------+!           |
|           !                       |   :     |         |___       |!           |
|           !                       |   :     |         |___| <--o-<.send()____ |
|           !                       |   :<<-------<tcp:<|___|   W2S|!      PUSH |
|           !                       |   :    .connect() <BUF+------+!           |
|           !                       |   :     |                     !           |
|           !                       |   :     |                     !           |
+===========!=======================+   :     +=====================!===========+

Ad 2) No,
but one may create one's own "ZeroMQ-consumables" to test the distributed system's ability to setup a new transport/signalling socket, being ready to dispose it, if the RTO-test fails to prove that both ( multiple ) sides are ready to setup + communicate over the ZeroMQ infrastructure ( notice, that the problems are not only with the ZeroMQ layer, but also the App-side need not be ready/in such a state to handle the expected communication interactions ( and may cause soft-locks / dead-locks ).


The best next step?

What I can do for your further questions right now is to direct you to see a bigger picture on this subject >>> with more arguments, a simple signalling-plane / messaging-plane illustration and a direct link to a must-read book from Pieter HINTJENS.

Roundhead answered 4/11, 2016 at 18:14 Comment(2)
Oh my. Didn't know about deadlocks. I have some more reading to do. Thanks for the push.Heterochromous
Why did they ban you, again? Checked the chats, found nothing, curious :)Dacosta

© 2022 - 2024 — McMap. All rights reserved.