Disruptor pattern - how are master and slave nodes kept in sync?

Asked 17/7, 2013 at 10:41 Answered 17/7, 2013 at 11:42

In the LMAX Disruptor pattern, the replicator is used to replicate the input events from a master node to slave node. So the setup would probably look like the following:

enter image description here

The replicator of master node writes the events to a DB (all though we can think better mechanisms than writing to a DB- it is not very important to the problem statement). The Receiver of the slave node reads from the DB and puts the events onto the slave node's ring buffer.

The output events of the slave node are ignored.

Now there is chance that the Business Logic Processor of Master node will be slower than the Business Logic Processor of the Slave Node. For Example BL of Master node could be at slot 102, where as Slave node could be at 106. (This can occur because the replicator reads the event from the ring buffer before the business logic processor).

In such a scenario if master node fails and slave node now becomes the master node, a few crucial events could be missed by the external systems. This could happen because Node 2 when it was acting as slave node has its output ignored.

Martin Fowler does state that the job of the replicator is to keep the nodes in sync: "Earlier on I mentioned that LMAX runs multiple copies of its system in a cluster to support rapid failover. The replicator keeps these nodes in sync"

But I am not sure how it can keep Business Logic Processor in sync? Any ideas?

Literary answered 17/7, 2013 at 10:41 Comment(0)

Replication is directly from master to slave node and not via a database. The replication gates on acknowledgement from the slave.

http://www.infoq.com/presentations/LMAX

The link above goes into more detail and it is worth reading the comments discussion on the presentation.

Archaeology answered 17/7, 2013 at 11:39 Comment(3)

Thanks Martin for your reply. If the replicator of master "gates" on the ACK from slave, then in the case of slave being down, the Business Logic Processor of of the master cannot proceed? How is this situation handled? – Literary 17/7, 2013 at 12:0

The Disruptor is a pattern for inter-thread messaging. Your questions are about how high-availability is implemented in an event sourced system and therefore beyond the scope of the Disruptor. Martin Folwer's article is just one illustration of how the Disruptor can be employed. – Archaeology 17/7, 2013 at 12:8

An example of an consensus algorithm for replication can be found here ramcloud.stanford.edu/wiki/download/attachments/11370504/… – Archaeology 17/7, 2013 at 12:22

If it is low cost to have dropped events, then you can just ignore it (?).

As a simple implementation, you could have the output disruptor on the primary notify the slave that it has completed sending a packet. Think of it as a two-stage replicator - one to replicate the event, and a second replicator to confirm that the event has been sent.

In a real world implementation you may need additional downstream support in your architecture (especially replay / retry). Depending on your application requirements, you need a capability to detect that there is a gap in the output events and fetch them as necessary. Assuming your events are idempotent, there should be no issues to go back in time and replay the events.

Suppose a single packet is lost on your outbound channel or your internet line goes down? Even if it is successfully sent out from the disruptor, it can still be lost. This depends on your specific application and requires a lot more thought than can go here as to what failure scenarios you can tolerate.

Iorgos answered 17/7, 2013 at 11:42 Comment(1)

I think the idea of playing back the events if we do recognize a gap is important one. Thanks – Literary 19/7, 2013 at 10:20

Recommended topics

Hot tags