Should one use Disruptor (LMAX) with a big model in memory and CQRS?
Asked Answered
I

2

7

Our system has structured model (about 30 different entities with several kind of relations) entirely kept in memory (about 10 Gb) for performance reasons. On this model we have to do 3 kind of operation:

  1. update one or a few entities
  2. query for a particular data (this usually require to read thousands of entities)
  3. get statistical data (how much memory is used, how many queries for kind etc.)

Currently the architecture is a fairly standard one, with a pool of threads for servlets that use a shared model. Inside the model there are a lot of concurrent collections, but still there are many waits because some entities are "hotter" and mostly of threads want to read/write them. Note also that usually queries are much more cpu and time consuming than writes.

I'm studying the possibility to switch to a Disruptor architecture keeping the model in a single thread, moving everything possible (validity checks, auditing, etc.) out of the model in a separate consumers.

First question of course is: does it make sense?

Secondo question is: ideally write requests should take precedence over read ones. Which is the best way to have a priority in disruptor? I was thinking about 2 rings buffers and then try to read from the highpriority one more often than from the low priority one.

To clarify the question is more architectural than about the actual code of LMAX Disruptor.

Update with more details

Data is a complex domain, with many entities (>100k) of many different types (~20) linked between them in a tree structure with many different collections.

Queries usually involve traversing thousands of entities to find the correct data. Updates are frequent but quite limited like 10 entities at time, so in the whole data are not changing very much (like 20% for hour).

I did some preliminar tests and it appears the speed advantages of querying the model in parallel outweigh the occasional write locks delays.

Irradiate answered 17/10, 2012 at 8:34 Comment(2)
Hi Uberto - Can you add some detail. What sort of queries are you running? And the update of entities is happening on same few entities or on lots of different entities? Are the entities linked to each other or mostly independent and how are they related?Alkaloid
Regarding question 2 : precendance of reads over writes, LMAX naturally is suited for event sourcing, which says that you keep events not those models, your current models (or more optimized ones that are super fast on read operations) will still be there but you never ever change which even happened when, if you got a reading operation before the write you should process it in order you get it, in order to have the same state reproducable if you replay the chain of events... So in this case i this priority is wrong here, you do that when two threads write in collection maybe...Exciseman
S
2

"ideally write requests should take precedence over read ones."

Why ? Most fast locks like C# ReaderWriterLockSlim do the opposite.. A write needs to block all reads to avoid partial reads. So such locks allow many concurrent reads hoping things get "quite" and then do the write .. ( The write does run at its number in the queue but its very likely many reads which came after it were processed before it locks)..

Prioritizing writes is a good way to kill concurrency ..

Is eventual concurrency / CQRS an option ?

Selectivity answered 3/11, 2012 at 12:14 Comment(3)
assuming the flow of read query is more or less constant, there is no gain in delaying the write. Updating the model before reads has some business advantage.Irradiate
Yes - the reason locks do it is they hope the read burst dies down which it often does , continuous flow to a single entity is pretty rare ( it could be you have a table lock which is not a good idea for high performance) , if the flow is contious than you have major congestion at this point and should look at other things such as serving reads of copies / spinlocks etc. .We are talking mili to 1/10 of a second does it matter that much to the business ,most business even live with partial write issues when an entity from an ORM layer is modified and served at the same time.Selectivity
Easiest way to do it is probably just have an in memory command queue and every 1-10 ms put them into the ring buffer with all the writes first. Disruptors are not many to many ( well i have never seen anyone use them like that so id want to see benchmarks) ..nor do they read from multiple ring buffers ,each consumer ( domain) reads from one ring buffer.. though you can have multiple domains reading from a single bufferSelectivity
S
2

LMAX may be appropriate ..

The LMAX people first implemented traditional , then they implemented actors ( with queues ) and found actors spent most of the time in the queues. Then they went to the single threaded architecture..Now the disruptor is not the key to the architecture the key is a single threaded BL. With 1 writer ( single thread) and small objects your going to get a high cache hit and no contention. To do this they have to move all long running code out of the Business layer ( which includes IO) . Now to do this they use they used the disruptor its basically just a ring buffer with a single writer as has been used in device driver code for a while but at a huge message scale.

First I have one disagreement with this , LMAX is an actor system .. Where you have 1 actor for all the BL ( and the disruptors connect other actors) .. They could have improved there actor system significantly instead of jumping to 1 actor for BL , namely

  1. Dont have lots of services / actors , try to have commonly used components in one service. ( this comes up time and time again in SOA / distributed systems also)
  2. When communicating between actors use point to point queues not many to 1. ( like all the services buses!)
  3. When you have point to point queues ensure the tail is a pointer to a separate memory area . With 2 and 3 you can now use lockless queues ,and the queues / threads only have 1 writer (and you can even use not temporal 256 but YMM bit writes into the queue) . However the system now has more threads (and if you have done 1 correctly a relatively small amount of messages between actors) . The queues are similar to disruptors and can batch process many entries and can use a ring buffer style.

With these actors you have a more modular ( and hence main-table) system ( and the system could launch more actors to process the queues - note 1 writer ! )

re your case I think 20% of changes in an hour is huge... Are the queries always on in memory objects ? Do you have in memory hash tables / indexes ? Can you use read only collections ? Does it matter if your data is old eg Ebay uses a 1 hour refresh on its items collection so the item collection itself is static. With a static collection and static item briefs , they have a static index and you can search and find items fast and all in memory . Every hour it gets rebuilt and when complete ( it could take minutes to rebuild ) the system switches to the new data. Note the items themselves are not static.

IN your case with a huge domain the single thread may get a lowish cache hit ..which is different to LMAX who have a smaller domain for each message to pass over..

An agent based system may be the best bet namely because a bunch of entities can be grouped and hence have a high cache hit. But i need to know more. eg move validity checks, auditing, logging etc out is probably a good plan . Less code = smaller objects = higher cache hit and LMAX objects were small.

Hope this quick dump helps but its hard from only a glance.

Selectivity answered 3/11, 2012 at 11:58 Comment(0)
S
2

"ideally write requests should take precedence over read ones."

Why ? Most fast locks like C# ReaderWriterLockSlim do the opposite.. A write needs to block all reads to avoid partial reads. So such locks allow many concurrent reads hoping things get "quite" and then do the write .. ( The write does run at its number in the queue but its very likely many reads which came after it were processed before it locks)..

Prioritizing writes is a good way to kill concurrency ..

Is eventual concurrency / CQRS an option ?

Selectivity answered 3/11, 2012 at 12:14 Comment(3)
assuming the flow of read query is more or less constant, there is no gain in delaying the write. Updating the model before reads has some business advantage.Irradiate
Yes - the reason locks do it is they hope the read burst dies down which it often does , continuous flow to a single entity is pretty rare ( it could be you have a table lock which is not a good idea for high performance) , if the flow is contious than you have major congestion at this point and should look at other things such as serving reads of copies / spinlocks etc. .We are talking mili to 1/10 of a second does it matter that much to the business ,most business even live with partial write issues when an entity from an ORM layer is modified and served at the same time.Selectivity
Easiest way to do it is probably just have an in memory command queue and every 1-10 ms put them into the ring buffer with all the writes first. Disruptors are not many to many ( well i have never seen anyone use them like that so id want to see benchmarks) ..nor do they read from multiple ring buffers ,each consumer ( domain) reads from one ring buffer.. though you can have multiple domains reading from a single bufferSelectivity

© 2022 - 2024 — McMap. All rights reserved.