DDD: How to handle large collections

Asked 18/9, 2014 at 14:27 Answered 28/6, 2018 at 17:18

Solved domain-driven-design ddd-repositories

I'm currently designing a backend for a social networking-related application in REST. I'm very intrigued by the DDD principle. Now let's assume I have a User object who has a Collection of Friends. These can be thousands if the app and the user would become very successful. Every Friend would have some properties as well, it is basically a User. Looking at the DDD Cargo application example, the fully expanded Cargo-object is stored and retrieved from the CargoRepository from time to time. WOW, if there is a list in the aggregate-root, over time this would trigger a OOM eventually. This is why there is pagination, and lazy-loading if you approach the problem from a data-centric point of view. But how could you cope with these large collections in a persistence-unaware DDD?

Wimple answered 18/9, 2014 at 14:27 Comment(4)

Why are the friends in the aggregate root, are they part of an invariant you need to protect? – Sander 18/9, 2014 at 20:17

They are part of User because a User has friends, but also a user has posted messages, is part of groups, has pets, etc (user is almost the monolithic root of everything). – Wimple 19/9, 2014 at 6:45

That sounds more like a relational model than an aggregate root. 'Effective Aggregate Design' has some great advice: dddcommunity.org/library/vernon_2011 – Sander 19/9, 2014 at 6:48

Thanks @JefClaes, that really clarified it for me. – Wimple 19/9, 2014 at 15:36

As @JefClaes mentioned in the comments: You need to determine whether your User AR indeed requires a collection of Friends.

Ownership does not necessarily imply that a collection is necessary.

Take an Order / OrderLine example. An OrderLine has no meaning without being part of an Order. However, the Customer that an Order belongs to does not have a collection of Orders. It may, possibly, have a collection of ActiveOrders if a customer is limited to a maximum number (or amount) iro active orders. Keeping a collection of historical orders would be unnecessary.

I suspect the large collection problem is not limited to DDD. If one were to receive an Order with many thousands of lines there may be design trade-offs but the order may much more likely be simply split into smaller orders.

In your case I would assert that the inclusion / exclusion of a Friend has very little to do with the consistency of the User AR.

Something to keep in mind is that as soon as you start using you domain model for querying your start running into weird sorts of problems. So always try to think in terms of some read/query model with a simple query interface that can access your data directly without using your domain model. This may simplify things.

So perhaps a Relationship AR may assist in this regard.

Canner answered 19/9, 2014 at 4:29 Comment(2)

Your proposed "Relationship" is exactly what I came up with after reading the PDF's from the website @Sander mentioned in the comments. Great :) – Wimple 19/9, 2014 at 15:38

I don't really like Relationship since it has little meaning in the UL, at least from my point of view. UserFriends might be a better name. But certainly, a separate aggregate. – Josiahjosias 4/3, 2016 at 19:54

If some paging or optimization techniques are the part of your domain, it's nothing wrong to design domain classes with this ability.

Some solutions I've thought about

If User is aggregate root, you can populate your UserRepository with method GetUserWithFriends(int userId, int firstFriendNo, int lastFriendNo) encapsulating specific user object construction. In same way you can also populate user model with some counters and etc.
On the other side, it is possible to implement lazy loading for User instance's _friends field. Thus, User instance can itself decide which "part" of friends list to load.
Finally, you can use UserRepository to get all friends of certain user with respect to paging or other filtering conditions. It doesn't violate any DDD principles.

DDD is too big to talk that it's not for CRUD. Programming in a DDD way you should always take into account some technical limitations and adapt your domain to satisfy them.

Bagman answered 18/9, 2014 at 15:24 Comment(8)

Would it be okay to call the UserRepository from the User Domain-object? – Wimple 18/9, 2014 at 15:39

I didn't tell that it is needed. Lazy loading is better implemented via Proxy object (#25829672). In general it's a rare case to use repository inside domain object, but I think it's nothing wrong with it. – Bagman 18/9, 2014 at 15:52

A little more thoughts. If User is aggregate root, thus you have Repository to list User instances. But User's friend is User too. Thus you can get John's friends by getting all users which friends list contains John. But not getting of John and all his friends. Do you see the difference? Former approach allows you to use UserRepository from your controller in a way like "UserRepository.GetAllUser(hasFriend="John", some_paging_params)" – Bagman 18/9, 2014 at 15:57

But this will not work if instead of friends, you'll want to get user's posts which aren't aggregate roots (if it is). According to first sentence of my answer, you may also define something like UserSlice domain class encapsulating data for proper presentation. – Bagman 18/9, 2014 at 16:3

Also, using a repository inside the User object would require an injection (dependency) which should be avoided I think. – Wimple 18/9, 2014 at 16:9

Would it make sense to implement adding/removing friends through a Proxy list object? Then I can handle all related logic in the User domain without the need for a repository. – Wimple 18/9, 2014 at 16:18

@Pepster, why do you think that dependencies inside domain objects should be avoided? Poor domain without logic and dependencies leads to martinfowler.com/bliki/AnemicDomainModel.html. Lazy loading is a last resort. I don't think you should and can avoid Repository implementation. – Bagman 18/9, 2014 at 17:22

Let us continue this discussion in chat. – Wimple 18/9, 2014 at 18:22

Do not prematurely optimize. If you are afraid of large stress, then you have to benchmark your application and perform stress tests.

You need to have a table like so:

friends
id, user_id1, user_id2

to handle the n-m relation. Index your fields there.

Also, you need to be aware whether friends if symmetrical. If so, then you need a single row for two people if they are friends. If not, then you might have one row, showing that a user is friends with the other user. If the other person considers the first a friend as well, you need another row.

Lazy-loading can be achieved by hidden (AJAX) requests so users will have the impression that it is faster than it really is. However, I would not worry about such problems for now, as later you can migrate the content of the tables to a new structure which is unkown now due to the infinite possible evolutions of your project.

Poaceous answered 18/9, 2014 at 14:46 Comment(2)

Thanks, I don't agree this is early optimisation, storing/loading your 1000+ Friends in a Collection everytime you go to the Datastore is a design flaw IMHO. As well, the question is not about my user<-friend->user use-case but more generic about how DDD models big data-sets. DDD seems to be viable for relatively small DomainObjects, but if you are more into CRUD (like my case) with large lists there seems to be something missing for that. – Wimple 18/9, 2014 at 14:55

You are saying DDD is useless for most REST Apis? – Wimple 18/9, 2014 at 15:24

Your aggregate root can have a collection of different objects that will only contain a small subset of the information, as reference to the actual business objects. Then when needed, items can be used to fetch the entire information from the underlying repository.

Fieldwork answered 28/6, 2018 at 17:18 Comment(0)

Recommended topics

Hot tags