DDD: many-to-many relationship between two different aggregates

Asked 20/2, 2020 at 17:17 Answered 27/3, 2020 at 5:32

Solved architecture domain-driven-design

I have two "big" entites or aggregates, which have their own business logic - they are saved, updated and destroyed in separate transactions. They have their own child entities, which are manipulated through these aggregate roots. But the problem is, these two aggregates must be in many-to-many relationship to each other. From user interface point of view, there is a kind of UI, where one already existing instance of the second aggregate is added to the first aggregate. In terms of database, there is a table which holds foreign keys to tables of the first and the second aggregate

entity_one_id | entity_two_id
1             | 2
1             | 3
1             | 4

In the example above, an instance of first aggregate holds references to the second aggregate.

And my question is, is it ok from Domain Driven Design perspective, if when saving the first aggregate I load an instance of the second aggregate and add it to the first aggregate. In pseudo-code it may look like:

aggregateOne = aggregateOneRepository->getById(1);
....
aggregate2 = aggregateTwoRepository->getById(2);
aggregate3 = aggregateTwoRepository->getById(3);
aggregate4 = aggregateTwoRepository->getById(4);
aggregateOne->addChildAggregate(aggregate2);
aggregateOne->addChildAggregate(aggregate3);
aggregateOne->addChildAggregate(aggregate4);
aggregateOneRepository->update(aggregateOne);

It seems like in this transaction I do not change the second aggregate and change just one single aggregate. But I'm not sure if DDD theory allows to load multiple different aggregates, when saving one aggregate. So, does this kind of code break the theory or not?

Hausa answered 20/2, 2020 at 17:17 Comment(0)

An aggregate root should not contain instances of other aggregate roots. An aggregate may be passed a transient reference to another when, for instance, invoking a method but it does not hold onto that reference. It is only used in the call.

Your example is actually more common than you may realise. If we had to change to Order and Product aggregates we have a many-to-many relationship. An OrderItem represents that relationship and is best defined as a value object.

When you find that you need to "reference" another aggregate then rather use either only the id or some value object that contains at a minimum the other aggregate's id.

I have a slightly different view on transactions. An aggregate root is a consistency boundary and, as such, fits quite well into a transaction boundary. Every attempt should be made to keep to a single aggregate within a transaction but you also need to be pragmatic about it. If you need a high level of consistency and eventual consistency may not be an option then that is a "rule" that I am willing to bend and include more than one aggregate in a transaction. An example may be processing a journal transaction where an amount is transferred from one account within my system to another. When you have different systems then eventual consistency will have to do and "rolling" back would require compensating operations.

Gies answered 21/2, 2020 at 5:13 Comment(3)

Do you mean to say, that instead of referencing the second aggregate directly in the repository of the first aggregate, it is better to use an "intermediate" entity like OrderItem which holds references to both aggregates? – Hausa 21/2, 2020 at 6:48

Not quite. In ER terms the OrderItem is called an associative entity which implements quite nicely as a VO in DDD. You typically find that one side of the relationship has a more natural fit and &that* AR would be the owner. In the Order<->Product relationship the link table OrderItem fits closer to the Order which is why it forms part of that aggregate. Probably since an order contains relatively few products whereas a product would be on many more orders. The OrderItem may be stored with the OrderId but the object does not need it since it is contained in an order. – Gies 21/2, 2020 at 7:4

You are also going to find that it is a lot easier working with referenced ids as opposed to trying to hydrate aggregates that don't belong to the repository of the aggregate you are after. Having OrderRepository try to return an Order that has a reference to a Customer is problematic. The Order should be constructed with only the CustomerId. – Gies 21/2, 2020 at 7:6

First of all I agree with Eben, don't have object references of one aggregate within another, use a value object just holding the other aggregates id instead. And in the database this id is simply a string or integer (or whatever you are using as id type in the database) instead of a foreign key.

And always ask yourself what data of the other aggregate do you really need and for what operations of your new aggregate do you need what kind of data at all?

In most cases it turns out simply passing the required data gathered from the first aggregate to a method called on the new aggregate is enough.

If that even happens in the same bounded context I tend to be pragmatic about that. I collect the aggregate I need the data from via it's repository and then pass it as a parameter to the new aggregate's method. Or only some part of it. I usually do this inside an application service.

That way you do not need hold any other information of the old aggregate in the new one rather than it's id but you always have an up-to-date state of the old aggregate wherever you need it. This concept is not even related to domain-driven design but best practice in general, only use the dependency where you really need it.

And if you don't want to rely on the old aggregate's structure simply create some kind of new value object that you populate with the old aggregate's data in the application layer. Therefore you do not even need to gather the data from the old aggregate's repository but simple have some service which only reads the required data from the storage directly. But I would only recommend this if performance is your issue here...

And just one last comment about using foreign keys in databases in monolithic applications:

Don't use foreign keys if you reference something from another bounded context if you ever plan to split up the mononlith at some point. Use logical references instead which you treat as some kind of remote id and resolve them at the application layer. Otherwise separating the database for different services you like to extract from the mononlith can become a nightmare.

Agony answered 27/3, 2020 at 5:32 Comment(0)

Holding reference to another aggregate (be it many-to-many or something else) and updating it in the same transaction, in fact, violates the fundamental principle of aggregate design. An aggregate is a unit of consistency, conforming to its own consistency boundary. A transaction is supposed to update and thereby ensure consistency of only one aggregate.

Updates across aggregates, across consistency boundaries, is naturally necessary. The DDD recommended way for those kind of updates is eventual consistency: updating them later asynchronously in a different transaction. An aggregate refers another aggregate by holding its identifier, rather than a field with relationship (many-to-many in your case). Whenever updating the other aggregate is necessary, leave a domain event containing the other aggregate id published before committing your current transaction. A domain event subscriber picks up the event asynchronously, retrieves the aggregate with the id, makes necessary update, and stores it. That is roughly the basic idea.

Tonietonight answered 20/2, 2020 at 22:23 Comment(4)

You may find this question relevant. – Tonietonight 20/2, 2020 at 22:25

In my example, first aggregate holds references to another aggregate, but only first aggregate is updated. The second aggregate is left untouched. Does it stil violates the theory? – Hausa 21/2, 2020 at 6:49

Yes, it violates the DDD recommendation. Because, the aggregate reference that has no use in your transaction has performance and scalability implications. I have outlined a few in my answer to the question mentioned in my first comment. – Tonietonight 21/2, 2020 at 7:17

These are the implications again: "... References to aggregates unnecessarily increases application's memory footprint (retrieving an entity that is not going to be used in a transaction); in highly concurrent use cases, where locks are in effect, degrades application performance (unnecessarily locking an entity); hampers data partitioning (both aggregates need to be processed in the same data node)." – Tonietonight 21/2, 2020 at 7:18

Recommended topics

Hot tags