EventStore partial ordering of events and other features
Asked Answered
S

2

5

I'm trying to evaluate EventStore as in reliable queuing mechanism internal to a server software.

MSMQ fails as an alternative because it cannot support partial ordering, ordered messages within "conversations" of messages. And because of its 4MB message size limit (which could be overcome with partial ordering). SQL Service Broker does support partial ordering, but is a pain in the butt to set up and manage programmatically.

As documentation on EventStore is admittedly sparse, can someone with experience with EventStore help with the following?

  • Does EventStore support transactional processing of events - that is, if processing fails, can the dequeue be rolled back?
  • With multiple readers in various threads, processes, or machines, does EventStore enforce that each event is dispatched(?) to only one reader (at a time, perhaps during a transaction)
  • Assuming the above are possible, can events on different "conversations" be read simultaneously in any order, while messages in the same conversation be read singly and in-order?
  • I read that EventStore is basically "At-least-once" delivery. Is it possible, using certain storage providers, to ensure "exactly-once" delivery?
  • How are "poison" events handled? Events that error-out during processing. Perhaps the error is temporary in nature and can be retried. Perhaps it is permanent in nature and requires administrative intervention.
  • Can EventStore storage be manipulated by hand if necessary? Can it be done while other readers continue to read?

(I read that transactions at the storage engine aren't required, but I still use the language of transactions to mean whatever replaces transactions at the EventStore level. If there are crucial functional consequences in the switch from transactions to whatever, please comment on them. I don't need to understand every aspect right away, just need hope with which to buy more time to experiment.)

Snodgrass answered 21/11, 2011 at 14:11 Comment(0)
C
4

While the EventStore could potentially be used to build a full-blown queue, it was never designed with that in mind. This means that there are a lot of opinionated decisions that went into building the library that go against the very requirements imposed by your question.

For example, the concept of exactly-once delivery is something that messaging systems don't really support. Other things mentioned above, like poison messages, aren't really an issue because the EventStore isn't hooked into the message pipeline in that way.

The problem that you're trying to solve doesn't seem to be one where the EventStore can help you. Therefore, I would recommend evaluating a full-blown message queue, such as RabbitMQ.

Also, what do you have on your messages that makes them larger than 4MB? If you're pushing around files or large binary streams, why not push those to some kind of highly available "global" storage (like Amazon S3) and then have a pointer to those on the message?

Chlortetracycline answered 22/11, 2011 at 4:6 Comment(2)
Thanks! The application is for a server that is dedicated to guaranteed delivery of messages between protocols for services, but the queue is internal to the application but can route arbitrary messages. S3 is too much. External storage is too complex. Are you saying that RabbitMQ DOES meet all these requirements? Thanks!Snodgrass
Rabbit is definitely worth looking at for your situation, but I can't guarantee it fits everything you mention.Chlortetracycline
M
4

Some thoughts, even though you seem content with the first answer:

  • Partial Ordering is what happens if you track causality on messages. There are ways of doing this. The naïve way of doing it would be to simply keep a list of all nodes in a distributed network that a given message has seen and append to that list as you shove the message around. I say message, but I really refer to the messages in that conversation.

    Now, that might work well with simple systems, but as you start getting more complex systems you might want a Saga to keep track of what message means exactly what.

    Still, you might have your requirement of partial ordering on a single recipient node - and then sagas, being in a hub-and-spoke pattern with regards to the message flow won't help you. Then perhaps you actually need to add some logic to the transport that you're using.

    One algorithm is called vector clocks, another similar is called version vectors. Here's a sample implementation of a vector clock in Go - if you want to spend a few hours pairing, I'm working on a tiny vector clock lib for F# - because the algo is really simple, actually. If you want to actually read something that makes sense on this, I recommend this book - Elements of Distributed Systems. Chapter 2-3, 5 are good.

    Then, you get partial ordering of your conversations in a distributed system. The reason that you can't get this with a queue is that you have a gulf of network between your queue and your node, and if either the node or the process on the same node as the queue, goes down as it has a message in transit, this message will be requeued and reordered. Same thing if you NACK the message. You can get around this reordering problem by using 2PC over the queue and the consuming client, or you can sort the messages by their vector clocks, or you can sort them on a sequence id that was given in the publishing/sending application, or you can sort them on some data as it makes semantical sense from your consumer's perspective. The choice is yours.

  • As for the other requirements, such a poison messages, you should look at what a service bus gives you. I use MassTransit, personally, and it handles poison messages well. These are some failure modes on consuming messages:

    • Serialization error - you made a programming mistake. Fix your development process, because these shouldn't happen. If they still happen, just move them to the poison queue - these messages are likely corrupted.
    • Unhandled exception from your code - you made a programming mistake. Again, your dev process is up for inspection. Unless you throw it to let your service bus runtime move the message to the poison queue.
    • You might experience problems writing to your local consumer's DB - this is a operations problem and should not happen in the code. Kill your process, because you can't actually do anything from code right now. Naigos or some other process monitor should tell your operations guys that something is down and needs to be fixed urgently - because if you can't write to your read model, then probably your read model can't service the requests meant for it. Puppet or some other process monitor probably would restart your process a while later and then you can go through the same steps, assuming everything is OK, but this time, don't start consuming off of your queue until you have a connection to the database (this is what NHibernate does with its static initialization when it starts, for example) - and implement a retry policy such as circuit breaker on top of that retry logic.
  • Large events - make sure that your queue API chunks too long a byte array. ZeroMQ has multipart messages. AMQP/RabbitMQ hasn't, so you'd have to chunk them yourself, of course, forcing you to re-order them again. Or you can just give a handle to a binary blob of bits somewhere where you can read it back from, like the rest of us.

Manichaeism answered 20/1, 2012 at 9:19 Comment(1)
This is an excellent post. Thank you. I'm reading some of the initial research papers on these topics now. For this project, I ended up with a custom-built SQL implementation using NOLOCK, READPAST, and compare-and-swap techniques to achieve the low-contention transaction-based partial-order delivery that I needed. (Two tables, dbo.Sequences and dbo.Messages). Perhaps Vector Clocks can help in a future project and allow me to avoid the minor garbage collection problem of abandoned Sequences that exists in my implementation.Snodgrass

© 2022 - 2024 — McMap. All rights reserved.