Event Sourcing: proper way of rolling back aggregate state

Asked 30/1, 2018 at 22:58 Answered 9/5, 2020 at 3:57

Solved domain-driven-design cqrs event-sourcing

I'm looking for an advice related to the proper way of implementing a rollback feature in a CQRS/event-sourcing application.

This application allows to a group of editors to edit and update some editorial content, an editorial news for instance. We implemented the user interface so that each field has an auto save feature and now we would like to provide our users the possibility to undo the operations they did, so that it is possible to rollback the editorial news to a previous known state.
Basically we would like to implement something like to the undo command that you have in Microsoft Word and similar text editors. In the backend, the editorial news is an instance of an aggregate defined in our domain and called Story.

We have discussed some ideas to implement the rollback and we are looking for an advice based on real world experiences in similar projects. Here is our considerations about this feature.

How rollback works in real world business domains

First of all, we all know that in real world business domains what we are calling rollback is obtained via some form of compensation event.

Imagine a domain related to some sort of service for which it is possible to buy a subscription: we could have an aggregate representing a user subscription and an event describing that a charge has been associated to an instance of the aggregate (the particular subscription of one of the customers). A possible implementation of the event is as follows:

public class ChargeAssociatedToSubscriptionEvent: DomainEvent
{
  public Guid SubscriptionId {get; set;}
  public decimal Amount {get; set;}
  public string Description {get; set;}
  public DateTime DueDate {get; set;}
}

If a charge is wrongly associated to a subscription, it is possible to fix the error by means of an accreditation associated to the same subscription and having the same amount, so that the effect of the charge is completely balanced and the user get back its money. In other words, we could define the following compensation event:

public class AccreditationAssociatedToSubscription: DomainEvent
{
  public Guid SubscriptionId {get; set;}
  public decimal Amount {get; set;}
  public string Description {get; set;}
  public DateTime AccreditationDate {get; set;}
}

So if a user is wrongly charged for an amount of 50 dollars, we can compensate the error by means of an accreditation of 50 dollars to the user subscription: this way the state of the aggregate has been rolled back to the previous state.

Why things are not as easy as they seem

Based on the previous discussion, the rollback seems quite easy to be implemented. If you have an instance of the story aggregate at the aggregate revision B and you want to roll it back to a previous aggregate revision, say A (with A < B), you just have to do the following steps:

check the event store and get all the events between revisions A and B
compute the compensation event for each of the occurred events
apply the compensation events to the aggregate in the reverse order

Unfortunately, the second step of the previous procedure is not always possible: given a generic domain event it is not always possible to compute its compensation event, because the amount of information contained inside the event could not be enough to do that. Maybe it is possible to wisely define all the events so that they contain enough information to be able to compute the corresponding compensation event, but at the current state of our application there are several events for which computing the compensation event is not possible and we would prefer to avoid changing the shape of our events.

A possible solution based on state comparison

The first idea to overcome the issues with compensation event is computing the minimum set of events needed to roll back the aggregate by comparing the current state of the aggregate with the target state. The algorithm is basically the following:

get an instance of the aggregate at the current state (call it B)
get an instance of the aggregate at the target state (call it A) by applying only the first n events persisted inside event store (our repository allows to do that by specifying the aggregate id and the desired point in time to which materialize the aggregate)
compare the two instances and compute the minimum set of events to be applied to the aggregate in the state B in order to change its state to A
apply the computed events to the aggregate

A smarter approach based on event replay

Another way to solve the problem of rolling back to a previous state of the aggregate could be doing the same thing that the aggregate repository does when an aggregate is materialized at a specific point in time. In order to do that we should define an event, say StoryResettedEvent, whose effect is to reset the state of the aggregate by completely emptying it and do the following steps:

apply the StoryResettedEvent to our aggregate so that its state is emptied
get the first n events for the aggregate we are working on (all the events from the first saved event up to the target state A)
apply all the events to the aggregate instance

The main problem I see with this approach is the event to empty the state of the aggregate: it seems somewhat artificial, not a real domain event with a business meaning, but rather a trick to implement the rollback functionality.

The third way: persisting the compensation event each time an event is saved inside the event store

The third way we figured out to get what we need is based again on the concept of compensation event. The basic idea is that each event of the application could be enriched with a property containing the corresponding compensation event.

In the point of the code where an event is raised it is possible to immediately compute the compensation event for the event to be raised (based on the current state of the aggregate and the shape of the event), so that the event could be enriched with this information that this way will be saved inside the event store. By doing so the compensation events events are always available, ready to be used in case of a rollback request. The downside of this solution is that each domain event must be modified and only a minimum part of the compensation events we must compute and save inside the event store will be useful for an actual rollback (most of them will never be used).

Conclusions

In my opinion the best option to solve the problem is using the algorithm based on state comparison (the first proposed solution), but we are still evaluating what to do.

Does anyone have already had a similar requirement ? Is there any other way to implement a rollback ? Are we completely missing the point and following bad approaches to the problem ?

Thanks for helping, any advice will be appreciated.

Algophobia answered 30/1, 2018 at 22:58 Comment(5)

Are you sure ES is the way to go in this case? Memento pattern comes to mind when you're dealing with undo. Further more, with ES the compensation is defined by the domain and it's another domain event that gets added. Undo is mainly a UI feature here, from the Domain point of view is yet another text changed. – Whitcomb 31/1, 2018 at 16:56

@Whitcomb thanks for replying to my post. We cannot handle the undo as a pure UI feature, because our purpose is not only allowing the undo of the last operation done on an entity, but we would like to allow an editor to look at the history of all the changes happened to an entity an decide to rollback to a generic previous state, simply by clicking on a particular entry of the whole history of changes. This means sending a command to the backend and saving new events, so that the next time the aggregate is materialized from the event store its state is the changed one. – Algophobia 31/1, 2018 at 21:31

@EnricoMassone: Then you can't use Even Sourcing. Event sourcing is implemented as immutable event store. An event written to the event store may never again be changed or removed. Changes are always done by compensating actions which revert the changes as a new immutable event – Sunda 1/2, 2018 at 8:4

You can still revert it back to the state it originally was, buy creating compensation events from the existing story though, if applicable (it may require a system where each event has an existing compensating event and/or may require some computation (compare state A with state A-n, then calculate the difference)). allowing event history to be changed, would defeat the purpose of it. – Sunda 1/2, 2018 at 8:8

Also please remember, DDD is about capturing the real world process of your company or customer. In reality, an event that happened can't be undone (sorry, time machines just don't exist!). Why should your software allow that? It means you did not fully understand what DDD is about. You make software that represents existing processes of a company and not change the real world processes for a piece of software – Sunda 1/2, 2018 at 8:11

How the compensation events are generated should be the concern of the Story aggregate (after all, that's the point of an aggregate in event sourcing - it's just the validator of commands and generator of events for a particular stream).

Presumably you are following something like a typical CQRS/ES flow:

client sends an Undo command, which presumably says what version it wants to undo back to, and what story it is targetting
The Undo Command Handler loads the Story aggregate in the usual way, either possibly from a snapshot and/or by applying the aggregate's events to the aggregate.
In some way, the command is passed to the aggregate (possibly a method call with args extracted from the command, or just passing the command directly to the aggregate)
The aggregate "returns" in some way the events to persist, assuming the undo command is valid. These are the compensating events.

compute the compensation event for each of the occurred events

...

Unfortunately, the second step of the previous procedure is not always possible

Why not? The aggregate has been passed all previous events, so what does it need that it doesn't have? The aggregate doesn't just see the events you want to roll back, it necessarily processes all events for that aggregate ever.

You have two options really - reduce the book-keeping that the aggregate needs to do by having the command handler help out in some way, or the whole process is managed internally by the aggregate.

Command handler helps out: The command handler extracts from the command the version the user wants to roll back to, and then recreates the aggregate as-of that version (applying events in the usual way), in addition to creating the current aggregate. Then the old aggregate gets passed to the aggregate's undo method along with the command, so that the aggregate can then do state comparison more easily.

You might consider this to be a bit hacky, but it seems moderately harmless, and could significantly simplify the aggregate code.

Aggregate is on its own: As events are applied to the aggregate, it adds to its state whatever book-keeping it needs to be able to compute the compensating events if it receives an undo command. This could be a map of compensating events, pre-computed, a list of every previous state that can potentially be reverted to (to allow state comparison), the list of events the aggregate has processed (so it can compute the previous state itself in the undo method), or whatever it needs, and it just stores it in its in-memory state (and snapshot state, if applicable).

The main concern with the aggregate doing it on its own is performance - if the size of the book-keeping state is large, the simplification of allowing the command handler to pass the previous state would be worthwhile. In any case, you should be able to switch between the approaches at any time in the future without any issues (except possibly needing to rebuild your snapshots, if you have them).

Government answered 1/2, 2018 at 14:5 Comment(1)

Thanks for helping. I agree with you, going with the idea of comparing states inside the aggregate seems to be the best way to manage rollback in an ES scenario – Algophobia 4/2, 2018 at 21:9

My 2 cents.

For rollback operation, an orchestration class will be responsible to handle it. It will publish a aggregate_modify_generated event and a projection on the other end for this event will fetch the current state of the aggregates after receiving it. Now when any of the aggregate failed, it should generate a failure event, upon receiving it, orchestration class will generate a aggregate_modify_rollback event that will received by that projection and will set aggregate state with the previously fetched state .
One common projector can do the task, because the events will have aggregate id.

Abstractionist answered 9/5, 2020 at 3:57 Comment(0)