I am dipping my feet into event sourcing pattern and trying to make sense of aggregates.I have read a few blogs and now I am more confused than ever before.
That is not your fault.
The concept of aggregates comes from the description of domain modeling by Eric Evans.
In a typical deployment, we have a database full of facts that we want to track. We have a model in which those facts change over time. We want to ensure that we track those changes correctly, meaning without introducing inconsistencies.
And the coarse answer to that is that we place our database "behind" a domain model which includes all of the rules for how the data in the database should be allowed to change. In the time of Evans, the domain model was a tier that lay between the application tier and the persistence tier. These days, you are more likely to hear "component" or "module" rather than tier, but the role isn't much changed: protect the database from incorrect changes.
If we examine the domain carefully, we will often find within the model clusters of data that exhibit an interesting property: the rules for changing the state of the cluster don't depend on any information outside of the cluster.
Example: in a trading application, bids and offers for some commodity are matched and processed. But the rules for matching one commodity (gold) are completely independent of the data associated with a different commodity (frozen concentrated orange juice). You don't need to know anything about what's going on in FCOJ to correctly process the activity in the gold trade, and vice versa.
These clusters, which can be considered in isolation, are aggregates.
The two key properties of that isolation are
- changes to state inside the aggregate don't depend on changes made outside the aggregate
- changes made outside the aggregate don't depend on changes made inside the aggregate
So in this example, we might have a TradeBook "aggregate" for Gold, and a TradeBook "aggregate" for FCOJ. To process an order, you would load the aggregate you need, apply the change to it, and save it, without ever needing to touch the other.
Will I have two aggregate classes
- AggregateEventsByInvoice
- AggregateEventsByInvoiceEmployee
No, you will probably have two views or projections based on the same event history.
More precisely, in the architecture described by Evans, there would be one
"aggregate root", and each of your use cases would be a different query in the API for that aggregate.
But more recently, the practice is to recognize that the use cases for reads don't need the same constraints as the use cases for writes. So today you are more likely to see a view (or projection) for each of your use cases, where the in memory representation of each is built from the events recorded in your data store.
so what I understand is an aggregate is essentially anything that can uniquely identify all events related to single instance (in my case invoice). So in my case can the invoiceId be considered as an aggregate?
No. In your case, the invoice is likely to be the aggregate.
To be more precise, your domain model is presumably coordinating changes between the balance, adjustment, status, and payment of each invoice; these values are an example of the sort of cluster I was talking about before. You can make changes to these values without having to consider, for example, the adjustment of Invoice[67890].
so what I understand is an aggregate is essentially anything that can uniquely identify all events related to single instance (in my case invoice).
A problem is that this understanding doesn't align well with the existing literature, and is likely to lead to confused communications.
In a document store, or key value store, the aggregate is analogous to the document, not the key that you use to look up the document. In an RDBMS, the aggregate would be the related entities, and the id would be the primary key that you use to load the entities. In an event store, the contents of the stream are describing the changes to the events in the aggregate, the id is just the key you use to find the correct events.
Is it okay for event store to have additional columns that are not aggregate id
Sure - you can store whatever metadata you like with the event. Creating additional columns can improve your query performance, make it easier to shard your data, and so on.
Is it okay for us to try to load events from event store that query on columns in addition to aggregate it ? (invoice id , employee id) in this case.
Sure, you can query on the events any way you like.
What's maybe not a good idea is trying to recover the current state of your domain model by replaying an arbitrary set of your events.
In your example, events [1,2,3,4,5]
taken together tell a coherent story about the invoice. But trying to create an understanding of the invoice from event [4]
by itself may not get you anywhere.
Remember, an event isn't usually a complete representation of the state of the model after the change, but rather a description of the things that changed. Think "patch", rather than "snapshot".