Aggregates in Event Sourcing Pattern

Asked 23/4, 2018 at 15:50 Answered 26/6, 2020 at 11:43

java aggregate microservices event-sourcing

I am dipping my feet into event sourcing pattern and trying to make sense of aggregates.I have read a few blogs and now I am more confused than ever before.

From what I inferred aggregates should somehow enable user to run different queries on the event store to retrieve different stream of events.

Use case :

I want to replay events on an invoice where the I want to see all actions done by a specific employee on the balance.
I want to replay all events on an invoice

I hope these are valid use cases.

Event Store:

| event_id | invoice_id | EmployeeId | Event            | Payload |
|----------|------------|------------|------------------|---------|
| 1        | 12345      | 12345      | Invoice_InReview | JSON    |
| 2        | 12345      | 12345      | Invoice_Billed   | JSON    |
| 3        | 12345      | 45567      | Invoice_Paid     | JSON    |
| 4        | 12345      | 77341      | Invoice_Reversed | JSON    |
| 5        | 12345      | 98421      | Invoice_Paid     | JSON    |

JSON contains info about changes to payment,adjustment and status of invoice Status is(Review,Billed,Paid)

So from my understanding there needs to be 5 components .

Event- A specific event.
Event Source - The service that calls repo to get related events
Event Stream - A list of events
Command - A request operation on invoice
Aggregate - An api to decide on inputs to load events

I understand how other things play but having a hard time wrapping my head around Aggregate. What is it ?

Will I have two aggregate classes

AggregateEventsByInvoice
AggregateEventsByInvoiceEmployee

I really am having a hard time figuring out the need and use of aggregate . All the examples I have seen use UUID which does not make sense to me at all? Any help will be greatly appreciated.

Isogamy answered 23/4, 2018 at 15:50 Comment(4)

Was the term "aggregate" used in the context of Domain Driven Design (DDD)? – Egor 23/4, 2018 at 15:58

@ConstantinGalbenu yes aggregate here is in reference to DDD. – Isogamy 23/4, 2018 at 16:32

martinfowler.com/bliki/DDD_Aggregate.html – Egor 23/4, 2018 at 16:41

lostechies.com/gabrielschenker/2015/05/25/ddd-the-aggregate – Egor 23/4, 2018 at 16:42

I am dipping my feet into event sourcing pattern and trying to make sense of aggregates.I have read a few blogs and now I am more confused than ever before.

That is not your fault.

The concept of aggregates comes from the description of domain modeling by Eric Evans.

In a typical deployment, we have a database full of facts that we want to track. We have a model in which those facts change over time. We want to ensure that we track those changes correctly, meaning without introducing inconsistencies.

And the coarse answer to that is that we place our database "behind" a domain model which includes all of the rules for how the data in the database should be allowed to change. In the time of Evans, the domain model was a tier that lay between the application tier and the persistence tier. These days, you are more likely to hear "component" or "module" rather than tier, but the role isn't much changed: protect the database from incorrect changes.

If we examine the domain carefully, we will often find within the model clusters of data that exhibit an interesting property: the rules for changing the state of the cluster don't depend on any information outside of the cluster.

Example: in a trading application, bids and offers for some commodity are matched and processed. But the rules for matching one commodity (gold) are completely independent of the data associated with a different commodity (frozen concentrated orange juice). You don't need to know anything about what's going on in FCOJ to correctly process the activity in the gold trade, and vice versa.

These clusters, which can be considered in isolation, are aggregates.

The two key properties of that isolation are

changes to state inside the aggregate don't depend on changes made outside the aggregate
changes made outside the aggregate don't depend on changes made inside the aggregate

So in this example, we might have a TradeBook "aggregate" for Gold, and a TradeBook "aggregate" for FCOJ. To process an order, you would load the aggregate you need, apply the change to it, and save it, without ever needing to touch the other.

Will I have two aggregate classes

AggregateEventsByInvoice

AggregateEventsByInvoiceEmployee

No, you will probably have two views or projections based on the same event history.

More precisely, in the architecture described by Evans, there would be one "aggregate root", and each of your use cases would be a different query in the API for that aggregate.

But more recently, the practice is to recognize that the use cases for reads don't need the same constraints as the use cases for writes. So today you are more likely to see a view (or projection) for each of your use cases, where the in memory representation of each is built from the events recorded in your data store.

so what I understand is an aggregate is essentially anything that can uniquely identify all events related to single instance (in my case invoice). So in my case can the invoiceId be considered as an aggregate?

No. In your case, the invoice is likely to be the aggregate.

To be more precise, your domain model is presumably coordinating changes between the balance, adjustment, status, and payment of each invoice; these values are an example of the sort of cluster I was talking about before. You can make changes to these values without having to consider, for example, the adjustment of Invoice[67890].

so what I understand is an aggregate is essentially anything that can uniquely identify all events related to single instance (in my case invoice).

A problem is that this understanding doesn't align well with the existing literature, and is likely to lead to confused communications.

In a document store, or key value store, the aggregate is analogous to the document, not the key that you use to look up the document. In an RDBMS, the aggregate would be the related entities, and the id would be the primary key that you use to load the entities. In an event store, the contents of the stream are describing the changes to the events in the aggregate, the id is just the key you use to find the correct events.

Is it okay for event store to have additional columns that are not aggregate id

Sure - you can store whatever metadata you like with the event. Creating additional columns can improve your query performance, make it easier to shard your data, and so on.

Is it okay for us to try to load events from event store that query on columns in addition to aggregate it ? (invoice id , employee id) in this case.

Sure, you can query on the events any way you like.

What's maybe not a good idea is trying to recover the current state of your domain model by replaying an arbitrary set of your events.

In your example, events [1,2,3,4,5] taken together tell a coherent story about the invoice. But trying to create an understanding of the invoice from event [4] by itself may not get you anywhere.

Remember, an event isn't usually a complete representation of the state of the model after the change, but rather a description of the things that changed. Think "patch", rather than "snapshot".

Selassie answered 23/4, 2018 at 17:7 Comment(3)

so what I understand is an aggregate is essentially anything that can uniquely identify all events related to single instance (in my case invoice). So in my case can the invoiceId be considered as an aggregate? – Isogamy 23/4, 2018 at 20:57

Question if invoiceId is aggregate according to the use case. 2 questions 1) Is it okay for event store to have additional columns that are not aggregate id ? (employee Id here) 2)Is it okay for us to try to load events from event store that query on columns in addition to aggregate it ? (invoice id , employee id) in this case. – Isogamy 25/4, 2018 at 19:38

It is ok to have additional metadata columns - audit information (who made the change, timestamp, etc.) is commonly added, and often a correlation id and/or request id (linking the event to the command or process manager that caused it). When building a projection or serving a read query directly from events, you could query for just a sub-set of events (sometimes you only care about particular event types). Obviously, this isn't useful when processing commands, where you need the whole stream for the aggregate. – Conny 2/5, 2018 at 10:41

In Event Sourcing an Aggregate is an object for which the state (fields) is not mapped to a record in a database as we are used to think in SQL/JPA world.

Is not a group of related entities.

It is a group of related records like in a history table.

GiftCard.amount is one field in a GiftCard Aggregate, but this field is mapped to all the events, like card-redeemed (take money from the card) ever created.

The source of data for your Aggregate is not a record in a database but the complete list of events ever created for that specific aggregate. We say we event sourced the aggregate.

Now we can ask ourselves how is it done? Who is aggregating these events so we are operating still with one field e.g GiftCard.amount? We might be expecting that amount to be a Collection and not a big-decimal type.

Is the event sourcing engine, doing the work, who might simply replay all the events in the creation order.

Having the events (the state of the aggregate) stored in a database/event-store/etc we then ask ourselves how do we get any meaningful or specific insight from these events?

How do we found all the gift-cards for which the first redeemed amount was the entire amount? How do answer all kind of questions? We can imagine the (SQL) query is quite complex and slow.

The event-store is not optimized for queries(read data), it is optimized for write data.

And so for reading/querying the data you will have a second database/schema/elastic-search etc, optimized for fast reading of data.

The Aggregate has the whole state, accepts commands, emits events when something changed and with these events is updating the read model (2nd database/schema/elastic-search etc)

Command for aggregate events and Query for reading data. And commands and queries responsibilities are segregated.

Graphophone answered 26/6, 2020 at 11:43 Comment(0)

Recommended topics

Hot tags