GraphQL Dataloader not knowing keys in advance
Asked Answered
F

1

11

Dataloader is able to batch and cache requests, but it can only be used by either calling load(key) or loadMany(keys).

The problem I am having is that sometimes I do not know they keys of the items I want to load in advance.

I am using an sql database and this works fine when the current object has a foreign key from a belongsTo relation with another model.

For example a user that belongs to a group and so has a groupId. To resolve the group you would just call groupLoader.load(groupId).

On the other hand, if I wanted to resolve the users within a group, of which there could be many I would want a query such as

SELECT * from users where user.groupId = theParticularGroupId

but a query such as this doesn't use the keys of the users and so I am not sure how make use of dataloader.

I could do another request to get the keys like

SELECT id from users where user.groupId = theParticularGroupId

and then call loadMany with those keys... But I could have just requested the data directly instead.

I noticed that dataloader has a prime(key, value) function which can be used to prime the cache, however that can only be done once the data is already fetched. At which point many queries would already have been sent, and duplicate data could have been fetched.


Another example would be the following query

query {
  groups(limit: 10) {
    id
    ...
    users {
      id
      name
      ...
    }
  }
}

I cannot know the keys if I am searching for say the first or last 10 groups. Then once I have these 10 groups. I cannot know the keys of their users, and if each resolver would resolve the users using a query such as

SELECT * from users where user.groupId = theParticularGroupId

that query will be executed 10 times. Once the data is loaded I could now prime the cache, but the 10 requests have already been made.

Is there any way around this issue? Perhaps a different pattern or database structure or maybe dataloader isn't even the right solution.

Franks answered 18/2, 2018 at 12:41 Comment(0)
S
7

You'll want a dataloader instance for the lookup you can do, in this case you have a group ID and you want the users:

import DataLoader from 'dataloader';

const userIdsForGroupLoader = new DataLoader(groupIds => batchGetUsersIdsForGroups(groupIds));

Now your batchGetUsersForGroups function is essentially has to convert an array of group IDs to an array of arrays of users (one array of user IDs for each group).

You'd start off with an IN query:

SELECT id from users where user.groupId in (...groupIds)

This will give you a single result set of users, which you'll have to manipulate, by grouping them by their groupId, the array should be ordered according to the original array of groupIds. Make sure you return an empty array for groupIds that don't have any users.

Note that in this we're only returning the user ids, but you can batch fetch the users in one go once you have them. You could tweak it slightly to return the users themselves, you'll have to decide for yourself if that's the right approach.

Everything I mention in this article can be achieved using clever use of Dataloader. But the key takeaway is that the values you pass to the load/loadMany functions don't have to correspond to the IDs of the objects you're trying to return.

Streamer answered 18/2, 2018 at 18:31 Comment(4)
Does this mean you would have to create a lot of different dataloaders for each way you want to load your data even if the same entities are being loaded? Like in a comment in your article you suggested using a loader such as loader.load({ id, first, last, after, before, filters}) to batch the request for connections. But then you'd also have to use the returned data to prime other dataloaders which may request the same type of data through a different signature which may or may not be used depending on what fields client queries. It seems like this could quickly become difficult to manage?Franks
I guess if you don't prime other dataloaders you just miss out on some caching but still get the batching effect which is the main benefit. That may greatly reduce the complexity.Franks
So in this example, a usersForGroupLoader could return Users, which means you have sufficient data to prime a plain old usersByIdLoader. Ultimately the best strategy depends on a combination of the underlying data fetching patterns, and the most common query patterns. In the site I did this for (depop.com), the number of core entities is relatively low (users, products, likes, bookmarks, comments, conversations, messages, and a few others), and I basically had a loader per lookup-type, which tended to mean one loader for an entity-by-id, and another for each major join.Streamer
I'll admit it did get a little hard to manage, the module responsible for creating all the loaders and coordinating caching between them was easily the most complicated part of the codebase. But it was my first attempt, so i'm sure with a little more experience it could be done more elegantly.Streamer

© 2022 - 2024 — McMap. All rights reserved.