GraphQL and MySQL: Solving the Join Problem
by Nicola Marcacci Rossi
As GraphQL adoption grows, developer teams have to implement GraphQL APIs for larger and larger enterprises and data sets. The larger the project, the more care has to be put into the database choice. While document databases such as MongoDB are the natural GraphQL companion, SQL databases are the standard for relational data.
Good news! Though trickier, it is possible to couple GraphQL and relational databases to very satisfying results. Unfortunately, documentation about how to achieve this has been lacking. This is what we want to remediate here.
This article assumes familiarity with GraphQL, Apollo Server, MongoDB and MySQL. It is part of a series on enterprise-grade GraphQL hosted by smartive.
- Part 1: Building Enterprise Grade APIs with GraphQL, MySQL and Node.js
- Part 2: Advanced GraphQL Patterns: The Almighty Root Resolver
- Part 3: Advanced GraphQL Patterns: Embrace the AST!
- Part 4: GraphQL and MySQL: Solving the Join Problem
- Part 5: GraphQL and Elasticsearch: A Love Letter 💌
- Part 6: The most popular GraphQL Servers and Clients for Node.js
Sample Data Structure: A Blog
All examples will build on a simple data structure: users can create posts, posts can have comments.
Our data structure
This is how you would describe it in a GraphQL schema (notice the to-one and to-many relationships, we will need to resolve them in different ways).
Simple Queries
Let’s start by implementing the resolver that allows us to answer a simple query without joins:
To build MySQL queries we will use the node library knex:
Linked Queries
So far so good, what we have seen is equivalent to what one would do with a NoSQL database like MongoDB. But what if we want to query linked data?
Let’s look at a number of approaches and their tradeoffs.
Structured Data
With MongoDB one could store the whole document (the post, its author, its comments etc.) as a single entry. Like this we would be done: no more resolvers needed.
This approach does have some downsides:
- It can’t be done for arbitrary data access patterns (what if I want each commenter’s posts? Suddenly more resolvers are needed).
- It leads to redundant/denormalized storage of data, which makes updates to it a complex issue.
And finally, this approach can’t be used with relational databases. We need a different approach.
The Naive Approach
The most natural way to solve linked queries is to fetch each entry independently, and let Apollo join everything:
This is really what Apollo resolvers want you to do. But this approach, too, has some downsides:
- It leads to “N+1” queries to the database (still better than sending “N+1” queries from the client to the server)
- Joining data in the Node runtime is slower than in the database
Let’s look at the MySQL queries these resolvers generate when resolving the above query:
Number of queries = 3 + number of comments. We know we can do better:
Number of queries = 2. The question is: how do we implement resolvers to generates such optimized queries?
The “Apollo Expert” Approach
If classic Apollo resolvers are like training wheels, they are definitely coming off now. We need to take the matter of resolving queries fully into our own hands. For that we need to implement an Almighty Root Resolver (ARR), i.e. a resolver that is responsible for solving the full query (recursively) without delegating things to lower resolvers.
To implement an ARR (and to do a number of other nifty things) you need to understand some key Apollo internals, namely how it represents the GraphQL schema and queries (hint: AST objects). I recommend you read our article Embrace The AST! before you proceed.
Solving To-One Queries
Let’s look at the post
→ user
relationship. It’s a to-one relationship, i.e. for each post there is exactly one user. Thinking in terms of MySQL results, to-one relationships are convenient because the result can always be represented in 1 single row — meaning, one can solve any amount of to-one relationships with a single query (with left joins). Here is a post
resolver that builds such a query:
What happens in this code snippet? We first query a post
and its simple fields, then we left join the User
table to get the relevant user information. unflatten
is a function that transforms the flat MySQL result row into a well-structured document, we’ll leave the implementation of that as an exercise for the reader.
Obviously, this query only solves a single hard-coded relationship, but we’re slowly getting there.
By the way: if you find yourself wondering what that info
object is and what you can do with it (spoiler: a lot), go back and read Embrace The AST! where we examine it in depth.
Solving To-Many Queries
To-many relationships, like the post
→ comments
relationship requires one to build a separate query for the related entity. This is how we do it:
The resolver first fetches the post, then the comments with a separate MySQL query, which it then appends to the post.
Generic Case With Directives
Let’s now look at how to structure an ARR to perform all needed joins (for to-one and to-many relationships) in a generic way. First, we need to annotate the schema with special directives:
In the resolver we can then inspect the schema AST to decide when to join, and how:
To complete the ARR, we need to extend this function to recursively compute sub joins. It’s a matter of further inspecting the query AST and deriving the queries that need to be executed based on the same core approach. But we’ll stop here as that the handling of that goes beyond the scope of such an article.
Just The Beginning
To translate GraphQL queries to efficient MySQL queries is not easy, but worth it. Once we had achieved that for our largest client, we were able to start adding more and more powerful features to our GraphQL schema in a generic way, always keeping the interface with MySQL optimized. Examples:
- Filters and ordering (e.g.
posts(where: { title: "xyz" }, orderBy: [{created: ASC}], first: 10, skip: 0) { ... }
) - Aggregations (e.g.
posts { commentsAggregations { COUNT AVG { rating } } }
) - Aggregation filters (e.g.
posts(where: { comments: {GTE: { COUNT: 6 } } }) { ... }
) - Ordering by aggregations (e.g.
posts(orderBy: [{ comments: { COUNT: DESC }]) { ... }
) - Distinct aggregations (e.g.
posts { commentsAggregations { distinctUsers { COUNT user { username } } } }
) - Polymorphism and joining sub-tables containing non-common fields (e.g.
users { name ...on Student { school } }
) - Handling external (as in: not coming from the database) fields, external filters, write and access permissions, sanitization rules… all with schema directives
- Schema generation based on a tighter (TypeScript) representation of the model
All in all, our decision to embrace the AST and create our own ARRs has allowed us to create a very powerful model-driven API (we keep being surprised ourselves about what use cases are already covered by it).
Build Enterprise-Grade GraphQL Applications
Want to become a GraphQL pro? Follow us and read our whole series on enterprise-grade GraphQL applications.
Part 1: Building Enterprise Grade APIs with GraphQL, MySQL and Node.js
Part 2: Advanced GraphQL Patterns: The Almighty Root Resolver
Part 3: Advanced GraphQL Patterns: Embrace the AST!
Part 4: GraphQL and MySQL: Solving the Join Problem
Part 5: GraphQL and Elasticsearch: A Love Letter 💌
Part 6: The most popular GraphQL Servers and Clients for Node.js
Need an expert team to implement your advanced web application? Check out our portfolio.