How do you prevent nested attack on GraphQL/Apollo server?
Asked Answered
D

5

64

How do you prevent a nested attack against an Apollo server with a query such as:

{
  authors {
    firstName
    posts {
      title
      author {
        firstName
        posts{
          title
          author {
            firstName
            posts {
              title
              [n author]
                [n post]
            }
          }
        }
      }
    }
  }
}

In other words, how can you limit the number of recursions being submitted in a query? This could be a potential server vulnerability.

Dissimilar answered 20/5, 2016 at 3:29 Comment(0)
G
65

As of the time of writing, there isn't a built-in feature in GraphQL-JS or Apollo Server to handle this concern, but it's something that should definitely have a simple solution as GraphQL becomes more popular. This concern can be addressed with several approaches at several levels of the stack, and should also always be combined with rate limiting, so that people can't send too many queries to your server (this is a potential issue with REST as well).

I'll just list all of the different methods I can think of, and I'll try to keep this answer up to date as these solutions are implemented in various GraphQL servers. Some of them are quite simple, and some are more complex.

  1. Query validation: In every GraphQL server, the first step to running a query is validation - this is where the server tries to determine if there are any serious errors in the query, so that we can avoid using actual server resources if we can find that there is some syntax error or invalid argument up front. GraphQL-JS comes with a selection of default rules that follow a format pretty similar to ESLint. Just like there is a rule to detect infinite cycles in fragments, one could write a validation rule to detect queries with too much nesting and reject them at the validation stage.
  2. Query timeout: If it's not possible to detect that a query will be too resource-intensive statically (perhaps even shallow queries can be very expensive!), then we can simply add a timeout to the query execution. This has a few benefits: (1) it's a hard limit that's not too hard to reason about, and (2) this will also help with situations where one of the backends takes unreasonably long to respond. In many cases, a user of your app would prefer a missing field over waiting 10+ seconds to get a response.
  3. Query whitelisting: This is probably the most involved method, but you could compile a list of allowed queries ahead of time, and check any incoming queries against that list. If your queries are totally static (you don't do any dynamic query generation on the client with something like Relay) this is the most reliable approach. You could use an automated tool to pull query strings out of your apps when they are deployed, so that in development you write whatever queries you want but in production only the ones you want are let through. Another benefit of this approach is that you can skip query validation entirely, since you know that all possible queries are valid already. For more benefits of static queries and whitelisting, read this post: https://dev-blog.apollodata.com/5-benefits-of-static-graphql-queries-b7fa90b0b69a
  4. Query cost limiting: (Added in an edit) Similar to query timeouts, you can assign a cost to different operations during query execution, for example a database query, and limit the total cost the client is able to use per query. This can be combined with limiting the maximum parallelism of a single query, so that you can prevent the client from sending something that initiates thousands of parallel requests to your backend.

(1) and (2) in particular are probably something every GraphQL server should have by default, especially since many new developers might not be aware of these concerns. (3) will only work for certain kinds of apps, but might be a good choice when there are very strict performance or security requirements.

Gamin answered 20/5, 2016 at 5:20 Comment(3)
Excellent response. Do you know of any tooling that can detect circular dependencies in GraphQL? I would also stress that another big concern is memory exhaustion. Each level deeper in a Posts -> Author -> Post hierarchy is a multiplier (i.e. 1 author with 5 posts -> 5 authors with 25 posts -> 25 authors with 125 posts, etc) that compounds not just SQL/query to the underlying data source, but heap allocation to send back the response. A few levels deep can easily deplete a few GB of RAM and crash the server entirely. 1 query could take out V8!Wilcox
I think this is where (2) and (3) would help. First, you can simply limit the amount of requests to the database a single query can do (kind of like a timeout). Second, you can have your server accept only pre-approved queries in production, see here for more details: dev-blog.apollodata.com/…Gamin
Note: GraphQL Ruby has built in analyzers for query depth and complexity. I'm not sure about the implementations for other languages. graphql-ruby.org/queries/analysis.htmlIapetus
L
15

To supplement point (4) in stubailo's answer, here are some Node.js implementations that impose cost and depth bounds on incoming GraphQL documents.

These are custom rules that supplement the validation phase.

Lightyear answered 9/8, 2017 at 23:57 Comment(0)
J
4

A variation on query whitelisting is query signing.

During the build process, each query is cryptographically signed using a secret which is shared with the server but not bundled with the client. Then at runtime the server can validate that a query is genuine.

The advantage over whitelisting is that writing queries in the client doesn't require any changes to the server. This is especially valuable if multiple clients access the same server (e.g. web, desktop and mobile apps).

Example

In development, you write your queries as usual against your dev server which allows unsigned queries.

Then in your client build step in CI, each query is tagged with its cryptographic signature. This signature is sent by the client as a header to the server when making the request, along with the full GraphQL query string.

Your staging and production servers are configured to require a signed queries. They calculate the signature of the query received in the same way as the CI server did during the build. If the signatures don't match then they don't process the query.

Limitations:

  • not suitable for public facing APIs since the secret must be shared with developers
  • clients cannot dynamically build a GraphQL query at runtime using string interpolation, but I've never had a need for this and it is discouraged
Jetblack answered 29/11, 2017 at 16:4 Comment(5)
Doesn't this kill the flexibility of GraphQL and sets it at the level of a regular HTTP request?Xylograph
I don't think so, but if you elaborate on which flexibility you have in mind, others will be better able to answer your question.Jetblack
I mean that you can choose what entities and properties you want to get.Xylograph
I've expanded my answer to hopefully address your concerns.Jetblack
But I don't mean dynamic interpolated queries, but a new client wants to use a new query and they have to report it to the API so it whitelists it?Xylograph
R
2

For the Query cost limiting you could use graphql-cost-analysis

This is a validation rule which parses the query before executing it. In your GraphQL server you just have to assign a cost configuration for each field of your Schema Type Map you want.

Rameau answered 9/1, 2018 at 10:30 Comment(0)
C
2

Don't miss graphql-rate-limit 👌a GraphQL directive to add basic but granular rate limiting to your Queries or Mutations.

Cassel answered 18/1, 2019 at 19:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.