Using Kafka as a (CQRS) Eventstore. Good idea?

Asked 17/7, 2013 at 19:22 Answered 21/4, 2022 at 13:29

Solved cqrs event-sourcing apache-kafka dddd

319

Although I've come across Kafka before, I just recently realized Kafka may perhaps be used as (the basis of) a CQRS, eventstore.

One of the main points that Kafka supports:

Event capturing/storing, all HA of course.
Pub/sub architecture
Ability to replay the event log which allows the ability for new subscribers to register with the system after the fact.

Admittedly I'm not 100% versed in CQRS / Event sourcing but this seems pretty close to what an events tore should be. The funny thing is: I really can't find that much about Kafka being used as an event store, so perhaps I am missing something.

So, is anything missing from Kafka for it to be a good event store? Would it work? Using its production? Interested in insight, links, etc?

Basically, the state of the system is saved based on the transactions/events the system has ever received, instead of just saving the current state/snapshot of the system which is what is usually done. (Think of it as a General Ledger in Accounting: all transactions ultimately add up to the final state) This allows all kinds of cool things, but just read up on the links provided.

Newsome answered 17/7, 2013 at 19:22 Comment(2)

Hi Geert-Jan. In retrospective, how did you deal with this problem? I have a related question (exposed here: #58764227). Most people suggesting Kafka's adoption seem to rely on the points of append-log inmmutability, high throughput, and partition order guarantee.I see problems related to fast searches within topics (for entity "reconstruction"), No transactional atomicity and no ordering accross partitions (100% order guarantee implies using only 1 partition -killing concurrency) – Tillich 8/11, 2019 at 14:5

Didn’t persue it in the end because I ended that sideproject. So no clear answer I’m afraid – Newsome 9/11, 2019 at 23:38

162

Kafka is meant to be a messaging system which has many similarities to an event store however to quote their intro:

The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the retention is set for two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.

So while messages can potentially be retained indefinitely, the expectation is that they will be deleted. This doesn't mean you can't use this as an event store, but it may be better to use something else. Take a look at EventStoreDB for an alternative.

UPDATE

Kafka documentation:

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

UPDATE 2

One concern with using Kafka for event sourcing is the number of required topics. Typically in event sourcing, there is a stream (topic) of events per entity (such as user, product, etc). This way, the current state of an entity can be reconstituted by re-applying all events in the stream. Each Kafka topic consists of one or more partitions and each partition is stored as a directory on the file system. There will also be pressure from ZooKeeper as the number of znodes increases.

Balinese answered 23/7, 2013 at 15:3 Comment(14)

I was looking at Kafka and had another concern: I didn't notice anything about optimistic-concurrency. Ideally I could say: "Add this event as item N+1 only if the object's most recent event is still N." – Malina 23/9, 2014 at 3:33

@Darien: I'm probably going with a setup where Redis feeds Kafka (using Redis Notifications). Since Redis allows for optimistic concurrency (using Watch/multi-exec), this should work – Newsome 16/10, 2014 at 15:28

@Malina I'm not an expert on event sourcing, but my understanding was that generally speaking you would not need optimistic concurrency because events are by definition records of things that have already happened historically. – Lancastrian 11/5, 2016 at 8:57

@Lancastrian I think if you already have an authoritative ordering of non-conflicting events, that implies wherever they live is your actual event-store technology, and Kafka is just being used as a secondary system to distribute them. – Malina 11/5, 2016 at 20:47

There is also valuable information here : groups.google.com/forum/#!topic/dddcqrs/rm02iCfffUY – Killen 25/5, 2016 at 21:7

I don't understand how the aggregate would remain consistent given publishing to kafka is an async operation. – Anyaanyah 17/2, 2017 at 3:53

The producer of a single event stream (everything that has happened to a single entity) exists in exactly one instance. There is no concurrency within that stream. One thing happens and then another. They can't happen at the same time. You gain concurrency in the system by handling changes to multiple entities at the same time, but those events end up in different streams. – Krill 20/4, 2017 at 9:9

I think Kafka Streams API matured over the years. We can use it for Event sourcing with CQRS – Lancastrian 5/4, 2018 at 20:20

@PerWiklander This is not always the case. We take advantage of a mixture of optimistic concurrency controls and parallelized aggregate command handlers in order to increase the availability of our services (in case an ec2 goes down). – Jury 18/4, 2018 at 14:50

Re: Update 2; Is it really common to have a separate stream per entity? I thought it was (up to) a stream per aggregate, each of which consists of multiple entities. – Mcclurg 6/7, 2019 at 22:3

@MihailMalostanidis What's the difference between an aggregate compared to an entity. I thought they were the same. – Warder 4/8, 2021 at 20:26

@TedHenry an example: The aggregate is an order from a shop, the entities are the individual lines of the order. – Mcclurg 5/8, 2021 at 13:57

@MihailMalostanidis Ahh. Thanks. We usually call those the "line items". – Warder 6/8, 2021 at 16:27

So if I understand correctly if you use an aggregate ID as your partition key Zookeeper will have trouble keeping track of consumer instances for each partition? Do we know if this improves with the removal of Zookeeper from Kafka? I know membership management is tricky so I can see how this would easily become a bottleneck. – Hopefully 17/8, 2022 at 11:40

356

I am one of the original authors of Kafka. Kafka will work very well as a log for event sourcing. It is fault-tolerant, scales to enormous data sizes, and has a built in partitioning model.

We use it for several use cases of this form at LinkedIn. For example our open source stream processing system, Apache Samza, comes with built-in support for event sourcing.

I think you don't hear much about using Kafka for event sourcing primarily because the event sourcing terminology doesn't seem to be very prevalent in the consumer web space where Kafka is most popular.

I have written a bit about this style of Kafka usage here.

Denominational answered 23/3, 2014 at 21:55 Comment(17)

Was going to post that link :) Awesome blog post. It would have been good to be able to comment it because I have many questions. @Newsome also take a look at "Lambda architecture", this is quite similar and the name is given from Storm author, mostly using some kind of hadoop based event log in many exemples – Orin 4/6, 2014 at 20:1

@Jay: Since I have renewed interest in this topic, could you please elaborate a bit on the fact that Kafka seems to be designed to have it's published messages expire after a set period of time? If using Kafka as an eventsource, messages should be stored indefinitely. It's probably configurable, but would this pose a problem? – Newsome 13/6, 2014 at 12:4

Is there any comparisons between kafka and eventstore? Specifically I like the focus on FRP in eventstore called Projections. Is there anything like that in Kafka/Samza? – Elgon 1/10, 2014 at 7:15

@CMCDragonkai: eventstore Projections are simply a combination of functions to change the eventstream correct? That is exactly what Samza (or Storm for that matter) is about. Kafka would provide the eventstream and Samza/Storm is used to do counts, aggregates, etc. and whatever you want to change , modify, fork the stream. Admittedly I've never used Eventstore, but I'm fairly certain that Samza/ Storms capabilities in this regard far surpass Eventstore Projections. – Newsome 1/11, 2014 at 1:14

@CMCDragonkai, btw, imho although related FRP is different then an eventsource setup. In FRP a signal on which all kinds of downstream stuff can be based, may be changed, which causes the updated signal to be passed through the pipeline. Like a cell in a spreadsheet updating all other cells that reference the changed cell. With CQRS an event cannot change. In fact the fact that an event is immutable is one of the guiding principles behind CQRS. – Newsome 1/11, 2014 at 1:25

I'm sure in FRP you can have immutable events as well. But that's an interesting way of thinking about Samza/Storm/Projections. They are all essentially functions that operate on an event stream. – Elgon 1/11, 2014 at 10:58

Also check this out: samza.incubator.apache.org/learn/documentation/0.7.0/… I would be interested to hear how you're making use of Kafka now. I'm planning to use it as well. – Elgon 1/11, 2014 at 11:4

I also am interested in @Geert-Jan's question to Jay. Kafka is not suitable for the actual event sourcing transactional side, due to needing a stream of events (topic) per domain aggregate (think millions). However, it is ideally suited to having events fed into it from e.g. GetEventStore. But this will only work with infinitely retained events (in our case), and aside from a few brief comments, this does not seem to be a supported use case of Kafka? Am I mistaken here? Samza, for example, assumes there are only two scenarios: time-based retention or key-based retention. There are others.. – Apteral 6/6, 2016 at 20:38

Dear, Jay, thank you very much for kafka. I have a related new question in StackOverflow. I am implementing an aggregate in Flink using event sourcing. I put the link here: #39764739 – Cohort 29/9, 2016 at 7:40

Kafka is a huge step forward for DDD and CQRS architectures. A lot of concepts really really plays well together with Kafka as Event Store. – Felipa 17/2, 2017 at 16:51

Event sourcing requires the log entries to be ordered. As far as I understand it, Kafka does not support message ordering across partitions and partitions are the method for scaling a topic. So, how is message ordering in a Kafka-based event sourcing system best handled alongside an event log scaled by partitions? – Strang 14/5, 2017 at 22:40

@eulerfx Assuming we would like to use Kafka as storage for event sourced system how should optimistic locking/concurrency be implemented? – Ize 29/7, 2017 at 13:10

@KrzysztofBranicki Who ever said you had to use optimistic concurrency or application-level locking to achieve an event store? There are other ways to ensure consistency that aren't nearly as contentious. Kafka has a built-in concurrency guarantee in the form of partitions - use them to your advantage. – Lowson 9/5, 2019 at 20:25

@AndrewLarsson I was talking about optimistic concurrency that protects domain invariants in DDD aggregates. E.g. How you ensure rule that order can't have more than 10 items inside (discard business sens in this) assuming multiple users may work on the order in paralel. This is why we keep our event store on Postgres and then forward events to Kafka. – Ize 10/5, 2019 at 6:22

@KrzysztofBranicki There is more than one way to achieve that goal. Kafka achieves it through partitions. It guarantees that only one consumer in the group operates on any given partition at any given moment - take advantage of that. You could simply partition them such that all commands addressed to the same aggregate go to the same partition. – Lowson 10/5, 2019 at 17:10

@AndrewLarsson We don't use Kafka to deliver commands. We use it only for events which are outcome of successfully processed commands. There is expectation in our system that all commands (excluding ones that participate in sagas) will finish synchronously - so you get synchronous response with either success or failure. You can't achieve that when you are queueing commands. Those are not what Greg Young calls "One way command", we need to have synchronous response about failures too. – Ize 12/5, 2019 at 8:24

@KrzysztofBranicki If you're at a scale where processing commands synchronously is still an option, then you're not at the scale where anything I'm talking about makes any sense at all. – Lowson 13/5, 2019 at 18:47

162

Kafka is meant to be a messaging system which has many similarities to an event store however to quote their intro:

The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the retention is set for two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.

UPDATE

Kafka documentation:

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

UPDATE 2

Balinese answered 23/7, 2013 at 15:3 Comment(14)

There is also valuable information here : groups.google.com/forum/#!topic/dddcqrs/rm02iCfffUY – Killen 25/5, 2016 at 21:7

I don't understand how the aggregate would remain consistent given publishing to kafka is an async operation. – Anyaanyah 17/2, 2017 at 3:53

I think Kafka Streams API matured over the years. We can use it for Event sourcing with CQRS – Lancastrian 5/4, 2018 at 20:20

Re: Update 2; Is it really common to have a separate stream per entity? I thought it was (up to) a stream per aggregate, each of which consists of multiple entities. – Mcclurg 6/7, 2019 at 22:3

@MihailMalostanidis What's the difference between an aggregate compared to an entity. I thought they were the same. – Warder 4/8, 2021 at 20:26

@TedHenry an example: The aggregate is an order from a shop, the entities are the individual lines of the order. – Mcclurg 5/8, 2021 at 13:57

@MihailMalostanidis Ahh. Thanks. We usually call those the "line items". – Warder 6/8, 2021 at 16:27

118

I keep coming back to this QA. And I did not find the existing answers nuanced enough, so I am adding this one.

TL;DR. Yes or No, depending on your event sourcing usage.

There are two primary kinds of event sourced systems of which I am aware.

Downstream event processors = Yes

In this kind of system, events happen in the real world and are recorded as facts. Such as a warehouse system to keep track of pallets of products. There are basically no conflicting events. Everything has already happened, even if it was wrong. (I.e. pallet 123456 put on truck A, but was scheduled for truck B.) Then later the facts are checked for exceptions via reporting mechanisms. Kafka seems well-suited for this kind of down-stream, event processing application.

In this context, it is understandable why Kafka folks are advocating it as an Event Sourcing solution. Because it is quite similar to how it is already used in, for example, click streams. However, people using the term Event Sourcing (as opposed to Stream Processing) are likely referring to the second usage...

Application-controlled source of truth = No

This kind of application declares its own events as a result of user requests passing through business logic. Kafka does not work well in this case for two primary reasons.

Lack of entity isolation

This scenario needs the ability to load the event stream for a specific entity. The common reason for this is to build a transient write model for the business logic to use to process the request. Doing this is impractical in Kafka. Using topic-per-entity could allow this, except this is a non-starter when there may be thousands or millions of entities. This is due to technical limits in Kafka/Zookeeper.

One of the main reasons to use a transient write model in this way is to make business logic changes cheap and easy to deploy.

Using topic-per-type is recommended instead for Kafka, but this would require loading events for every entity of that type just to get events for a single entity. Since you cannot tell by log position which events belong to which entity. Even using Snapshots to start from a known log position, this could be a significant number of events to churn through if structural changes to the snapshot are needed to support logic changes.

Lack of conflict detection

Secondly, users can create race conditions due to concurrent requests against the same entity. It may be quite undesirable to save conflicting events and resolve them after the fact. So it is important to be able to prevent conflicting events. To scale request load, it is common to use stateless services while preventing write conflicts using conditional writes (only write if the last entity event was #x). A.k.a. Optimistic Concurrency. Kafka does not support optimistic concurrency. Even if it supported it at the topic level, it would need to be all the way down to the entity level to be effective. To use Kafka and prevent conflicting events, you would need to use a stateful, serialized writer (per "shard" or whatever is Kafka's equivalent) at the application level. This is a significant architectural requirement/restriction.

Main reason: fitment for problem

added 2021/09/29

Kafka is designed to solve giant-scale data problems. An app-controlled source of truth is a smaller scale, in-depth solution. Using event sourcing to good effect requires crafting events and streams to match the business processes. This usually has a much higher level of detail than would be generally useful to at-scale consumers. Consider if your bank statement contained an entry for every step of a bank's internal transaction processes. A single deposit or withdrawal could have many entries before it is confirmed to your account. The bank needs that level of detail to process transactions. But it's mostly inscrutable bank jargon (domain-specific language) to you, unusable for reconciling your account. Instead, the bank publishes separate events for consumers. These are course-grained summaries of each completed transaction. These summary events are what consumers know as "transactions" on their bank statement.

When I asked myself the same question as the OP, I wanted to know if Kafka was a scaling option for event sourcing. But perhaps a better question is whether it makes sense for my event sourced solution to operate at a giant scale. I can't speak to every case, but I think often it does not. When this scale enters the picture, like with the bank statement example, the granularity of events tends to be different. My event sourced system should probably publish course-grained events to the Kafka cluster to feed at-scale consumers rather than use Kafka as internal storage.

Scale can still be needed for event sourcing. Strategies differ depending on why. Often event streams have a "done" or "no-longer-useful" state. Archiving those streams is a good answer if event size/volume is a problem. Sharding is another option -- a perfect fit for regional- or tenant-isolated scenarios. In less siloed scenarios, when streams are arbitrarily related in a way that can cross shard boundaries, sharding is still the move (partition by stream ID). But there are no order guarantees across streams, which can make the event consumer's job harder. For example, the consumer may receive transaction events before it receives events describing the accounts involved. The first instinct is to "just use timestamps" to order received events. But it is still not possible to guarantee perfect occurrence order. Too many uncontrollable factors. Network hiccups, clock drift, cosmic rays, etc. Ideally you design the consumer to not require cross-stream dependencies. Have a strategy for temporarily missing data. Like progressive enhancement for data. If you really need the data to be unavailable instead of incomplete, use the same tactic. But keep the incomplete data in a separate area or marked unavailable until it's all filled in. You can also just attempt to process each event, knowing it may fail due to missing prerequisites. Put failed events in a retry queue, processing next events, and retry failed events later. But watch out for poison messages (events).

Summary

Can you force Kafka to work for an app-controlled source of truth? Sure if you try hard enough and integrate deeply enough. But is it a good idea? No.

Update per comment

The comment has been deleted, but the question was something like: what do people use for event storage then?

It seems that most people roll their own event storage implementation on top of an existing database. For non-distributed scenarios, like internal back-ends or stand-alone products, it is well-documented how to create a SQL-based event store. And there are libraries available on top of a various kinds databases. There is also EventStoreDB, which is built for this purpose.

In distributed scenarios, I've seen a couple of different implementations. Jet's Panther project uses Azure CosmosDB, with the Change Feed feature to notify listeners. Another similar implementation I've heard about on AWS is using DynamoDB with its Streams feature to notify listeners. The partition key probably should be the stream id for best data distribution (to lessen the amount of over-provisioning). However, a full replay across streams in Dynamo is expensive (read and cost-wise). So this impl was also setup for Dynamo Streams to dump events to S3. When a new listener comes online, or an existing listener wants a full replay, it would read S3 to catch up first.

My current project is a multi-tenant scenario, and I rolled my own on top of Postgres. Something like Citus seems appropriate for scalability, partitioning by tentant+stream.

Kafka is still very useful in distributed scenarios. It is a non-trivial problem to expose each service's key events to other services. An event store is typically not built for that, but it's precisely what Kafka does well. Each service has its own internal source of truth (could be events, BNF, graph, etc), then listens to Kafka to know what is happening "outside". The service posts public events to Kafka to inform the outside of interesting things it encountered.

Pruchno answered 17/4, 2018 at 2:19 Comment(21)

thanks for your answer. may i ask why you dont mention greg youngs eventstore? just not tried? – Ciliata 27/3, 2019 at 22:26

@Ciliata I mentioned EventStore in the Update section (2nd paragraph). I will go back and link it. I have tried it, and it has impressive perf. For our small team, not introducing another database was deemed more important for the time being, hence Postgres (which is also used for views). It is possible that we move to EventStore in the future or in future products. – Pruchno 28/3, 2019 at 0:2

@KaseySpeakman Who ever said you had to use optimistic concurrency or application-level locking to achieve an event store? There are other ways to ensure consistency that aren't nearly as contentious. Kafka has a built-in concurrency guarantee in the form of partitions - use them to your advantage. – Lowson 9/5, 2019 at 20:21

@AndrewLarsson If you don't care about conflicts, sure, you don't have to worry about any concurrency control. That's perfectly fine for stream processing since you are simply responding to things that have already happened. But it's not great for event sourcing where you are using an event stream (for a single entity) as a source of truth, then using that to control what/whether new events get saved to that stream. Kafka topics don't work as concurrency controls here since design limitations prevent me from creating a topic for every entity (could be millions), as mentioned in the answer. – Pruchno 10/5, 2019 at 0:48

@KaseySpeakman Topics are not the same as partitions. A topic has one or more partitions. Partitions are guaranteed to only have one consumer per group at any given moment. Partition your entities in such a way as to take advantage of that. You don't need a topic per entity or even a partition per entity. You simply need to partition them in such a way as to guarantee that all commands addressed to the same entity go to the same partition. – Lowson 10/5, 2019 at 17:7

@AndrewLarsson Partition/ZK limitations still apply as stated, making partition per entity impractical. Kafka was designed for processing massive numbers of messages, which it does well, not for being able to continually reread and then append to a handful of events for a specific entity. Which is what is needed by event sourcing. – Pruchno 10/5, 2019 at 17:11

@KaseySpeakman Many entities can share a single partition. Who said you always have to load the state of the entity directly from the event store by replaying the events? There are other ways to achieve the same concept without strictly following line-by-line Greg Young's implementation. – Lowson 10/5, 2019 at 17:16

@KaseySpeakman Kafka was not only designed for mass messaging. It was also designed for deterministic, in-order consumption of messages by multiple consumers without conflicts - which is what is required to implement an event source system. Greg Young achieved those requirements by choosing optimistic concurrency and loading entities directly from the event store and replaying their events in-memory. But that is not the only way to achieve the same idea. – Lowson 10/5, 2019 at 17:30

@AndrewLarsson If you don't partition per entity, then how are you going to prevent conflicting events at the entity level? Since we've come full circle back to concurrency conflicts, then perhaps you should post your own article on medium or something on how you have used Kafka for event sourcing (not stream processing) in production. How you accomplish it with partition by type and without entity-level concurrency control. I would read it, and I wouldn't even troll you in comments if I disagreed. – Pruchno 10/5, 2019 at 17:31

You use a Kafka stream per aggregate type for commands. You partition this stream using aggregate ID as the partition key. If you have one partition, you can only process one command at a time (no concurrency issues, but limited throughput). To scale, increase the number of partitions (say to 10 or more). All commands for a particular aggregate will go to the same partition, hence are processed serially (avoiding concurrency issues when writing events). Finally, you use up-to-date snapshots to process commands, avoiding any need to load events. Kafka Streams makes all this pretty simple. – Spa 30/6, 2019 at 22:46

@Spa Inserting Kafka between your clients and command processing is a completely different animal from event sourcing. It adds a lot of complexity, especially for your clients. For example, they need to poll and/or listen for changes to find out if/when a command got handled. And the dependence on only snapshots to process commands has other trade-offs. Rather than putting quick blurbs in comments here, it would probably be more helpful to onlookers if a full description of how you used Kafka for everything in your architecture were put in a blog post and then linked here. – Pruchno 11/7, 2019 at 23:22

@KaseySpeakman I'm not necessarily advocating it as an approach (and I haven't done it - I use MongoDB at the moment as a store), just pointing out that it's possible to use Kafka, as Andrew Larsson was saying. You could hide the complexity from your clients if you use a reply queue and block the command request until you've got a response. Kafka is a better choice as an event store if you aren't command processing, but are just receiving events from clients (IoT etc.). – Spa 16/7, 2019 at 16:16

@Spa What happens when the response never comes? (Have to add a timeout.) You're basically reinventing a request/reply procedure that already exists in HTTP and other protocols. Just to use Kafka as a command queue. And that, just to save events to Kafka without concurrency issues. Whereas you could just use something else and avoid all that accidental complexity. Your last sentence, I agree with - Kafka makes a lot of sense in IoT and other stream processing scenarios. – Pruchno 16/7, 2019 at 22:16

@KaseySpeakman It's not really reinventing, given that reply queues are widely used with message buses (so this was invented long ago), but I agree it's more complex than HTTP and loses some of the benefits of an async model. There are alternatives to avoid the concurrency issue, such as an actor-based model (one actor per aggregate, using a framework). Any partitioned approach scales well, since you can cache aggregates in RAM much better since a fixed subset of aggregates is tied to each process. – Spa 17/7, 2019 at 13:33

@Spa Actor approach has its own complexities and tradeoffs. It's worthwhile if your problem space requires very chatty comms with the back-end (e.g. games). But using a stateless service approach is much easier to impl/scale and fits many common scenarios. This is why I keep saying: comments are not the place to do design. Choosing actors, for example has consequences to your time and code base. It is misleading to readers to just blurt it out as a solution without weighing the trade-offs (no space to do that here). The original question was about using Kafka for specific purpose. – Pruchno 17/7, 2019 at 18:16

I just found a talk that describes basically the approach discussed above (with Kafka Streams), summarised and linked from here: infoq.com/news/2018/07/event-sourcing-kafka-streams . The talk indicates they used websockets to deal with the async command handling. – Spa 23/7, 2019 at 17:4

@KaseySpeakman Using Kafka this way is not easy by any means. But if you're at the scale where you've seriously considered CQRS and Event Sourcing, then you are at the scale where you cannot afford to do things the easy way. Your concurrency model has a direct impact on your scale - don't choose one arbitrarily. Also, HTTP is not a reliable transport, and again, if you're at that scale, you can't afford to spend time solving lost and/or duplicate message problems. This can all be solved by using Kafka between the client and the command processor, but yes, it comes at the cost of complexity. – Lowson 7/8, 2019 at 21:51

@AndrewLarsson Are we theory crafting here or do you have specific experience with what you are espousing? – Pruchno 8/8, 2019 at 1:7

@KaseySpeakman I'm not theory crafting, I have experience in all of this. I hope to soon be able to write up an article on exactly how to do this. Although I would like to note that all good architectures started out as just an idea and a hypothesis. Just because something is new or radical doesn't make it any less true. It may make it more risky, sure, but not invalid. And most esteemed architects made their names by inventing a radical architecture. – Lowson 19/8, 2019 at 23:30

@AndrewLarsson interesting discussion, any news on that article? – Witling 26/12, 2020 at 14:12

@YassinHajaj I haven't gotten permission to write it yet, sorry. – Lowson 26/12, 2020 at 23:57

All the existing answers seem to be quite comprehensive, but there's a terminology issue, which I'd like to resolve in my answer.

What's Event Sourcing?

It seems like if you look at five different places, you get five different answers to that question.

However, if you look at Greg Young's paper from 2010, it summarises the idea quite nicely, from page 32 onwards, but it doesn't contain the ultimate definition, so I dare formulate it myself.

Event Sourcing is a way to persist state. Instead of replacing one state with another as a result of a state mutation, you persist an event that represents that mutation. Therefore, you can always get the current state of the entity by reading all the entity events and applying those state mutations in sequence. By doing that, the current entity state becomes a left fold of all the events for that entity.

What means a "good" event store (database)?

Any persistence mechanism needs to perform two basic operations:

Save the new entity state to the database
Retrieve the entity state from the database

That's where Greg talks about the concept of entity streams, where each entity has its own stream of events, uniquely identified by the entity id. When you have a database, which is capable of reading all the entity events by the entity id (read the stream), using Event Sourcing is not a hard problem.

As Greg's paper mentions Event Sourcing in the context of CQRS, he explains why those two concepts play nicely with each other. Although, you have a database full of atomic state mutations for a bunch of entities, querying across the current state of multiple entities is hard work. The issue is solved by separating the transactional (event-sourced) store that is used as the source of truth, and the reporting (query, read) store, which is used for reports and queries of the current system state across multiple entities. The query store doesn't contain any events, it contains the projected state of multiple entities, composed based on the needs for querying data. It doesn't necessarily need to contain snapshots of each entity, you are free to choose the shape and form of the query model, as long as you can project your events to that model.

For that reason, a "proper" event database would need to support what we call _real-time subscriptions that would deliver new (and historical, if we need to replay) events to the query model to project.

We also know that we need the entity state in hand when making decisions about its allowed state transition. For example, a money transfer that has already been executed, should not be executed twice. As the query model is by definition stale (even for milliseconds), it becomes dangerous when you make decisions on stale data. Therefore, we use the most recent, and totally consistent state from the transactional (event) store to reconstruct the entity state when executing operations on the entity.

Sometimes, you also want to remove the whole entity from the database, meaning deleting all its events. That could be a requirement, for example, to be GDPR-compliant.

So, what attributes would then be needed for a database sued as an event store to get a decent event-sourced system working? Just a few:

Append events to the ordered, append-only log, using entity id as a key
Load all the events for a single entity, in an ordered sequence, using the entity id as a key
Delete all the events for a given entity, using the entity id as a key
Support real-time subscriptions to project events to query models

What is Kafka?

Kafka is a highly-scalable message broker, based on an append-only log. Messages in Kafka are produced to topics, and one topic nowadays often contains a single message type to play nicely with the schema registry. A topic could be something like cpu-load where we produce time-series measurements of the CPU load for many servers.

Kafka topics can be partitioned. Partitioning allows you to produce and consume messages in parallel. Messages are ordered only within a single partition, and you'd normally need to use a predictable partition key, so Kafka can distribute messages across the partitions.

Now, let's go through the checklist:

Can you append events to Kafka? Yes, it's called produce. Can you append events with the entity id as a key? Not really, as the partition key is used to distribute messages across partitions, so it's really just a partition key. One thing mentioned in another answer is optimistic concurrency. If you worked with a relational database, you probably used the Version column. For NoSQL databases you might have used the document eTag. Both allow you to ensure that you update the entity that is in the state that you know about, and it hasn't been mutated during your operation. Kafka does not provide you with anything to support optimistic concurrency for such state transitions.
Can you read all the events for a single entity from a Kafka topic, using the entity id as a key? No, you can't. As Kafka is not a database, it has no index on its topics, so the only way to retrieve messages from a topic is to consume them.
Can you delete events from Kafka using the entity id as a key? No, it's impossible. Messages get removed from the topic only after their retention period expires.
Can you subscribe to a Kafka topic to receive live (and historical) events in order, so you can project them to your query models? Yes, and because topics are partitioned, you can scale out your projections to increase performance.

What about Kafka and Sansa (or Kafka Streams)?

The ability to fold events to some representation of state, and store this state in another database, asynchronously, is a side feature of Event Sourcing. We usually call these operations "projections" as you can fold events to state in many different ways. It is a useful feature as you can build use case-specific query models at will, and rebuild them from the beginning of times or from a certain point in time as you have the full history of events in the log at your disposal.

However, this is not what Event Sourcing is about as you can do exactly the same using queues, and no one ever said that passing messages through a queue and updating database records in the message consumer is "Event Sourcing".

Other things to consider

Projecting events to persisted state is one-way operation. If you have made a mistake, you can't revert it as the state is already persisted. You have to stop the system, re-project everything, and start the system again. It can take hours, or even weeks.
You cannot see the history of a single entity simply because all the events for, potentially, millions of entities are stored in a single topic. You'd need to scan the whole topic to figure out what happened with the entity.
When you receive a request from the user to delete their data, you won't be able to do it without re-shovelling events between topics. You, basically, must think about it upfront and apply rather complex patterns like crypto-shredding in advance or risk being not compliant with local privacy regulations. Ignoring these regulations might eventually drive your company out of business.

So, why people keep doing it?

I believe that the reason why a lot of people claim that Kafka is a good choice to be an event store for event-sourced systems is that they confuse Event Sourcing with simple pub-sub (you can use a hype word "EDA", or Event-Driven Architecture instead). Using message brokers to fan out events to other system components is a pattern known for decades. The issue with "classic" brokers as that messages are gone as soon as they are consumed, so you cannot build something like a query model that would be built from history. Another issue is that when projecting events, you want them to be consumed in the same order as they are produced, and "classic" brokers normally aim to support the competing consumers pattern, which doesn't support ordered message processing by definition. Make no mistake, Kafka does not support competing consumers, it has a limitation of one consumer per one or more partitions, but not the other way around. Kafka solved the ordering issue, and historical messages retention issue quite nicely. So, you can now build query models from events you push through Kafka. But that's not what the original idea of Event Sourcing is about, it's what we today call EDA. As soon as this separation is clear, we, hopefully, stop seeing claims that any append-only event log is a good candidate to be an event store database for event-sourced systems.

Cordate answered 21/4, 2022 at 13:29 Comment(3)

Kafka is not ideal for Eventsourcing while you can't query the inside of a Kafka Topic, the solution can be, yes use the Kafka for write side but for read side transfer the data to Elasticsearch or Apache Solr. There are frameworks like Akka Projections (doc.akka.io/docs/akka-projection/current/kafka.html) to do that, to project to Elasticsearch, I developed a module and explain it the following blog (mehmetsalgar.wordpress.com/2022/04/18/…) – Crossbill 17/5, 2022 at 6:41

or this blog (mehmetsalgar.wordpress.com/2022/05/17/…) – Crossbill 8/3, 2023 at 10:19

If you read my answer again, maybe it will be clearer. You cannot use stale data for making decisions in the domain. It can and it will go wrong, and it will go wrong badly. Using Elasticsearch to keep a projection of state will make things worse, a lot worse. – Cordate 12/3, 2023 at 17:55

You can use Kafka as event store, but I do not recommend doing so, although it might looks like good choice:

Kafka only guarantees at least once deliver and there are duplicates in the event store that cannot be removed. Update: Here you can read why it is so hard with Kafka and some latest news about how to finally achieve this behavior: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
Due to immutability, there is no way to manipulate event store when application evolves and events need to be transformed (there are of course methods like upcasting, but...). Once might say you never need to transform events, but that is not correct assumption, there could be situation where you do backup of original, but you upgrade them to latest versions. That is valid requirement in event driven architectures.
No place to persist snapshots of entities/aggregates and replay will become slower and slower. Creating snapshots is must feature for event store from long term perspective.
Given Kafka partitions are distributed and they are hard to manage and backup compare with databases. Databases are simply simpler :-)

So, before you make your choice you think twice. Event store as combination of application layer interfaces (monitoring and management), SQL/NoSQL store and Kafka as broker is better choice than leaving Kafka handle both roles to create complete feature full solution.

Event store is complex service which requires more than what Kafka can offer if you are serious about applying Event sourcing, CQRS, Sagas and other patterns in event driven architecture and stay high performance.

Feel free to challenge my answer! You might not like what I say about your favorite broker with lots of overlapping capabilities, but still, Kafka wasn't designed as event store, but more as high performance broker and buffer at the same time to handle fast producers versus slow consumers scenarios, for example.

Please look at eventuate.io microservices open source framework to discover more about the potential problems: http://eventuate.io/

Update as of 8th Feb 2018

I don't incorporate new info from comments, but agree on some of those aspects. This update is more about some recommendations for microservice event-driven platform. If you are serious about microservice robust design and highest possible performance in general I will provide you with few hints you might be interested.

Don't use Spring - it is great (I use it myself a lot), but is heavy and slow at the same time. And it is not microservice platform at all. It's "just" a framework to help you implement one (lot of work behind this..). Other frameworks are "just" lightweight REST or JPA or differently focused frameworks. I recommend probably best-in-class open source complete microservice platform available which is coming back to pure Java roots: https://github.com/networknt

If you wonder about performance, you can compare yourself with existing benchmark suite. https://github.com/networknt/microservices-framework-benchmark

Don't use Kafka at all :-)) It is half joke. I mean while Kafka is great, it is another broker centric system. I think future is in broker-less messaging systems. You might be surprised but there are faster then Kafka systems :-), of course you must get down to lower level. Look at Chronicle.
For Event store I recommend superior Postgresql extension called TimescaleDB, which focuses on high performance timeseries data processing (events are timeseries) in large volume. Of course CQRS, Event sourcing (replay, etc. features) are built in light4j framework out of the box which uses Postgres as low storage.
For messaging try to look at Chronicle Queue, Map, Engine, Network. I mean get rid of this old-fashioned broker centric solutions and go with micro messaging system (embedded one). Chronicle Queue is actually even faster than Kafka. But I agree it is not all in one solution and you need to do some development otherwise you go and buy Enterprise version(paid one). In the end the effort to build from Chronicle your own messaging layer will be paid by removing the burden of maintaining the Kafka cluster.

Grillwork answered 18/11, 2017 at 20:22 Comment(7)

Interesting view. Care to elaborate on a few points? > Kafka only guarantses at least once deliver and there are duplicates in the event store that cannot be removed. You seem to imply that there is such a thing as exactly once delivery. afaik (and I'm pretty sure about that) there's no such thing in a distributed system. 2) As to your point 2: classical school of (event sourcing / dddd) thought is that events are inherently immutable. I.e.: they happend, no way to change the past. What's the actual use -case of changing them in retrospect? Thanks! – Newsome 18/11, 2017 at 23:43

1. ) Hazelcast to ensure each message will be processed once and only once. 2. ) I don't like anything like _V2 in service code, so either you will backup to archive and recreate old events to their new versions (you still have the original truth), or you can hide/build this functionality directly into Event Store snapshot functionality, so there is single point of upcasting -> the event store. What are your solutions to this? – Grillwork 19/11, 2017 at 10:32

1) at-least-once + idempotence on the consumer. I.e.: check if event already seen. If so skip. Or better yet, have idempotent actions. Of course, this is not always possible. 2) I've never encountered needing to version events. I always treat the events themselves as the source of truth and include all info I would ever need on them. Doing this, I've never encountered a situation where I needed a different event-structure and/or data about an event. But perhaps ymmv. Interested in hearing in what situations you would actually need to have updated events. – Newsome 19/11, 2017 at 14:22

1.) can be way of choice.. 2.) then your data structures were perfect from beginning :-) lucky you, haha. I might not need it on my current project, but I am building a whole platform on forks of eventuate.io merged with some high-performance JEE only approaches taken from light eventuate 4j... this whole discussion is not place for comments on stackoverflow, but if you are interested in diving deeper I recommend this article: leanpub.com/esversioning/read – Grillwork 19/11, 2017 at 18:55

Kafka supports exactly once delivery now, by the way. Update bullet 1 – Westwardly 2/12, 2017 at 2:5

I know, I was independently checking now :-) – Grillwork 2/12, 2017 at 2:16

But upcasting remains a problem then, I've needed upcasting in the past. Is this totally not possible or it is possible but it's not easy. Are there any other caveats to watch out for? – V 24/1, 2018 at 22:0

I think you should look at axon framework along with their support for Kafka

Rios answered 21/4, 2020 at 19:32 Comment(0)

-4

Yes, Kafka works well in event sourcing model specially CQRS, however you have take care while setting TTLs for topics and always keep in mind that Kafka was not designed for this model, however we can very well use it.

Rebba answered 16/7, 2019 at 23:22 Comment(1)

Kafka was actually "designed for this type of usage", as stated here: confluent.io/blog/okay-store-data-apache-kafka; Using Kafka as an event store for event sourcing is the first use case in this article. They also say that NYT does it for their article data. – Riddle 12/6, 2021 at 9:39

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

UPDATE

UPDATE 2

UPDATE

UPDATE 2

TL;DR. Yes or No, depending on your event sourcing usage.

Downstream event processors = Yes

Application-controlled source of truth = No

Lack of entity isolation

Lack of conflict detection

Main reason: fitment for problem

Summary

What's Event Sourcing?

What means a "good" event store (database)?

What is Kafka?

What about Kafka and Sansa (or Kafka Streams)?

Other things to consider

So, why people keep doing it?

Update as of 8th Feb 2018

Recommended topics

Hot tags