How should one use Disruptor (Disruptor Pattern) to build real-world message systems? [closed]
Asked Answered
C

3

29

As the RingBuffer up-front allocates objects of a given type, how can you use a single ring buffer to process messages of various different types?

You can't create new object instances to insert into the ringBuffer and that would defeat the purpose of up-front allocation.

So you could have 3 messages in an async messaging pattern:

  1. NewOrderRequest
  2. NewOrderCreated
  3. NewOrderRejected

So my question is how are you meant to use the Disruptor pattern for real-world messageing systems?

Thanks

Links: http://code.google.com/p/disruptor-net/wiki/CodeExamples

http://code.google.com/p/disruptor-net

http://code.google.com/p/disruptor

Contrapuntist answered 3/8, 2011 at 21:12 Comment(0)
S
34

One approach (our most common pattern) is to store the message in its marshalled form, i.e. as a byte array. For incoming requests e.g. Fix messages, binary message, are quickly pulled of the network and placed in the ring buffer. The unmarshalling and dispatch of different types of messages are handled by EventProcessors (Consumers) on that ring buffer. For outbound requests, the message is serialised into the preallocated byte array that forms the entry in the ring buffer.

If you are using some fixed size byte array as the preallocated entry, some additional logic is required to handle overflow for larger messages. I.e. pick a reasonable default size and if it is exceeded allocate a temporary array that is bigger. Then discard it when the entry is reused or consumed (depending on your use case) reverting back to the original preallocated byte array.

If you have different consumers for different message types you could quickly identify if your consumer is interested in the specific message either by knowing an offset into the byte array that carries the type information or by passing a discriminator value through on the entry.

Also there is no rule against creating object instances and passing references (we do this in a couple of places too). You do lose the benefits of object preallocation, however one of the design goals of the disruptor was to allow the user the choice of the most appropriate form of storage.

Sodomy answered 4/8, 2011 at 9:40 Comment(4)
Regarding your comment "Some additional logic is required to handle overflow ..." - this is incorrect. Watch the LMAX video at infoq.com/presentations/LMAX, the ringbuffer array size is always constant (he used 20 million entries), and in practice it is either practically empty (the system is chugging along nicely) or full (there is something wrong, and no amount of expanding the array size will fix it). Martin did remark that the Disruptor pattern is so damn fast that the bottleneck becomes the I/O from either the network or the disk, so it will never fill up in practice.Indelicacy
I know, I was in that presentation. I'm one on the left. I realise that my answer is not particularly clear, I was talking about the size of the individual entries. It's not possible to know what size all of your messages will be, but you can take a really good guess and handle the exceptional case by a slower code path.Sodomy
you said the most common approach is to use a byte array as the storage type for the Disruptor. My question is: is there any disadvantage to using non-POD datatypes as the storage for Disruptor. If i use a structure which contains a java.String, for example, will it lead to cache misses? as the String member variable might be at some far location from the array in the RingBuffer. Thanks.Boyle
It's hard to be sure. Is this a new string that is allocated with each event, then you'll probably get a miss, or one that is shared across multiple events in that case you may benefit from temporal locality. Remember in any performance conversation testing and numbers are king.Sodomy
G
6

There is a library called Javolution (http://javolution.org/) that let's you defined objects as structs with fixed-length fields like string[40] etc. that rely on byte-buffers internally instead of variable size objects... that allows the token ring to be initialized with fixed size objects and thus (hopefully) contiguous blocks of memory that allow the cache to work more efficiently.

We are using that for passing events / messages and use standard strings etc. for our business-logic.

Groome answered 4/4, 2012 at 19:49 Comment(0)
I
0

Back to object pools.

The following is an hypothesis.

If you will have 3 types of messages (A,B,C), you can make 3 arrays of those pre-allocated. That will create 3 memory zones A, B, C.

It's not like there is only one cache line, there are many and they don't have to be contiguous. Some cache lines will refer to something in zone A, other B and other C.

So the ring buffer entry can have 1 reference to a common ancestor or interface of A & B & C.

The problem is to select the instance in the pools; the simplest is to have the same array length as the ring buffer length. This implies a lot of wasted pooled objects since only one of the 3 is ever used at any entry, ex: ring buffer entry 1234 might be using message B[1234] but A[1234] and C[1234] are not used and unusable by anyone.

You could also make a super-entry with all 3 A+B+C instance inlined and indicate the type with some byte or enum. Just as wasteful on memory size, but looks a bit worse because of the fatness of the entry. For example a reader only working on C messages will have less cache locality.

I hope I'm not too wrong with this hypothesis.

Ilyse answered 17/3, 2018 at 1:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.