Sure it takes a while.
Sure it is done async.
Let's damage first a bit the terminology.
ZeroMQ is a great framework. Each distributed-system's client, willing to use it ( except using just the inproc://
transport class ), first instantiates an async data-pumping engine .. the Context()
instance(s), as needed.
Each Scalable Formal Communication Pattern { PUSH | PULL | ... | XSUB | SUB | PAIR }
does not create a socket,
but
rather instantiates an access-point, that may later .connect()
or .bind()
to some counterparty ( another access-point, of a suitable type, in some Context()
instance, be it local or not ( again, the local-inproc://
-only infrastructures being the known exception to this rule ) ).
In this sense, an answer to a question "When the socket is ready?" requires an end-to-end investigation "across" the distributed-system, handling all the elements, that participate on the socket-alike behaviour's implementation.
Testing a "local"-end access-point RTO-state:
For this, your agent may self-connect a receiving access-point ( working as a PULL
archetype ), so as to "sniff", when the local-end Context()
instance has reached an RTO-state + a .bind()
- created O/S L3+ interface starts distributing the intended agent's-PUSH
-ed messages.
Testing a "remote"-agent's RTO-state:
This part can have an indirect or an explicit testing. An indirect way may use a message-embedded index. That can contain a raising number ( an ordinal ), which bears a weak information about an order. Given the PUSH
-side message-routing strategy is Round-robin, the local-agent can be sure, that until it's local PULL
-access-point receives all messages indicating a contiguous sequence of ordinals, there is no other "remote"-PULL
-ing agent in an RTO-state. Once the "local" PULL
-access-point receives "gap" in the stream of ordinals, that means ( sure, only in the case all the PUSH
's .setsockopt()
-s were setup properly ) there is another -- non-local -- PULL
-ing agent in an RTO-state.
Is this usefull?
Maybe yes, maybe not. The point was to better understand the new challenges that any distributed-system has to somehow cope with.
The nature of multi-stage message queuing, multi-layered implementation ( local-PUSH
-agent's-code, local Context()
-thread(s), local-O/S, local-kernel, LAN/WAN, remote-kernel, remote-O/S, remote Context()
-thread(s), remote-PULL
-agent's-code to name just a few ) and multi-agent behaviour simply introduce many places, where an operation may gain latency / block / deadlock / fail in some other manner.
Yes, a walk on a wild-side.
Nevertheless, one may opt to use a much richer, explicit signalling ( besides the initially thought just a raw-data transport ) and help to solve the context-specific, signalling-RTO-aware behaviour inside the multi-agent worlds, that may better reflect the actual situations and survive also the other issues that start to appear in non-monolythic worlds of distributed-systems.
Explicit signalling is one way to cope with.
Fine-tune the ZeroMQ infrastructure. Forget using defaults. Always!
Recent API versions started to add more options to fine-tune the ZeroMQ behaviour for particular use-cases. Be sure to read carefully all details available to setup Context()
-instance to tweak the socket instance access-point behaviour, so that it best matches your distributed-system signalling + transport needs:
.setsockopt( ZMQ_LINGER, 0 ) # always, indeed ALWAYS
.setsockopt( ZMQ_SNDBUF, .. ) # always, additional O/S + kernel rules apply ( read more about proper sizing )
.setsockopt( ZMQ_SNDHWM, .. ) # always, problem-specific data-engineered sizing
.setsockopt( ZMQ_TOS, .. ) # always, indeed ALWAYS for critical systems
.setsockopt( ZMQ_IMMEDIATE, .. ) # prevents "loosing" messages pumped into incomplete connections
and many more. Without these, design would remain nailed into a coffin in the real-world transaction's jungle.