GraphQL,Cassandra and denormalization strategy
Asked Answered
Q

1

12

Would a database like Cassandra and scheme like GraphQL work well together?

Cassandra ideology is based on the idea of optimizing your queries and denormalizing data. This doesn't seem to really mesh well with a GraphQL ideology where data seems to be accessible in every level of a query.

Example: Suppose I architect my Cassandra table like so:

User:
    name
    address
    etc... (many properties)

Group:
    id
    name
    user_name  (denormalized user, where we generally just need the name of a user)

But with GraphQL, it's one wouldn't exactly expect a denormalized User.

query getGroup {
   group(id: 1) {
     name
     users {
         name
     }
   }
}

So a couple of things: 1.) This GraphQL query could end up hitting our Cassandra database multiple times (assuming no caching). Getting the group name and for each of the users we might even hit it for each user. But lets say our resolve creates multiple User objects with one cassandra call.

2.) We can't really build a cassandra idiomatic database with denormalization and graphql in mind, can we? Otherwise we should expect certain properties of a User aren't returned to us with the query.

To sum up the question, what's the graphql strategy for working with denormalized data? Is it acceptable to omit certain properties that the client thinks are accessible? E.g the client tries to access address of user but we don't have that at the moment because our data is denormalized. Or should one not even worry about denormalization and just let graphQL make calls with a caching mechanism in between the db and graphql. E.g graphql first gets the group, then gets the user data for the group id.

Quaggy answered 20/2, 2017 at 5:1 Comment(1)
Speaking about Cassandra and GraphQL, please take a note about the stargate.io project. It's an open-source data proxy for Cassandra that provides various APIs including GraphQL.Perverse
P
1

This is a side effect of GraphQL where a query can get quite complex in retrieving the data. But as long as the user is actually requesting the data they need if you are smart about your resolvers the end result will actually be faster.

Consider tools like dataloader to cache when resolving a query.

As far as omitting certain properties graphql validates the response and will throw an error, although it will also return the data you gave. It would probably be better to implement some sort of timeout and throw a more descriptive error if there is an issue retrieving the data.

Pull answered 11/7, 2018 at 16:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.