Video details

How Prisma Solves the N+1 Problem in GraphQL Resolvers - Tim Suchanek @ Prisma Day 2020

GraphQL
07.01.2020
English

The talk was streamed on #PrismaDay​ | June 26th, 2020
Prisma Day is a community-focused online conference on modern application development and databases.
https://prisma.io/day​
Subtitles for this video have been professionally transcribed. Prisma can provide the subtitles for your tech video as well - get in touch with us at [email protected]

Transcript

- Hello, my name is Tim. And I'm working for Prisma as a TypeScript Engineer and I will talk about how Prisma solves the N+1 problem. So as I said, I'm working for Prisma as a TypeScript Engineer, and I'm living in Berlin. You can reach out to me @timsuchanek on GitHub or Twitter if you have any questions later. So what we'll cover today, we will talk about this mysterious N+1 problem, and what it is about and we also will see how we can solve it and how Prisma is helping us with that. So the N+1 problem is basically as old as data is as old as for example, SQL. Here we see a GraphQL query for the people who are not familiar with GraphQL we will dive a little bit into it and explain how this query is being resolved. So we will dissect this a little bit. And as you can see, we are querying for a user here we say something like a user takes 10. And what is this doing? This is giving us the first 10 users. It's giving us the id, email and name, but also all the posts that the user did in our blog, we just have an example blog application. And for every post, again, we get two scalar fields, Id title content publish, but we're also getting all the comments, the first 10 comments that have been posted to that post. And as you can imagine, there's quite some data fetching in both here. And so how would we actually resolve this? In order to understand this, we have this graph here, this illustration. And what is this basically about? Is that in GraphQL you always start at the top level. In our case, it's the users we start with the users. So we resolve the first 10 users. And then we get, for example, George, Alice and Bob. And now for every user, where we've got the scalar fields already, we now need to resolve the posts. And again, we go through all the users get all the posts, that's the next level of the query. And again, in our graph, so to say we need to reverse this graph. Yeah. How is this a graph, any relation, relational database, you can see as a graph, you can just say that the the rows in the database are basically the nodes. And the relations between these nodes, the edges, as we see here, it posts are related to comments. And again, after we got all the posts, we now get all the comments for the posts. So we have to do requests that return on the according data. And so let's look into how we can implement this. We're using the GraphQL.js implementation to resolve this in Node.js, and we wrap it with the Apollo server. And what is happening here is that we say, okay, we have a resolver. And a resolver is basically the thing that is responsible for giving us the data. And this thing is fairly simple. We have a SQL query and we say as please select all the columns, and give us back only the first 10 users. And again, as soon as we have the users, we continue and we say give us the first 10, or whatever we query for posts for the user. And again, we go one level deeper, we have select specified here for each post. And for each post, this select will be run the GraphQL.js engine is passing us in the post object that we already resolved in the level before and now we can just take the post id and say give me all comments that are pointing to the post with that post id. So let's run this now and try out how this stuff is working. So I prepared a simple server whoops, let me quickly open that. Yeah, there we go Apollo Postgres. So let's login to this, we're using the GraphQL playground, which is a console that helps you to query the GraphQL data. And so we will send this same GraphQL query that we have been seen before in our slacks. So let's query this. Okay, as you see, we get back the data as expected. We get JSON data back. We have the post again, the post, have comments and so on. So this is what we expected. And now let's actually look into the queries that have been generated under the votes to make this happen. And as we see whoo, we have a lot of queries. And the problem we have like many select queries here. And the problem is really we're generating so many select queries per post. And what is actually happening here, this is really a lot. The truth is we are, we are probably sending the hundred one queries. And this is exactly the N+1 problem. And what this is about is that we say, for one user, we have to do N queries to resolve the data for the posts. There's exactly the N+1 or you could also just call it 1+N. And let's look into that in our illustration that we had earlier. So here we see, okay, the first query, this is now just the numbers of the queries. The first query so users query, we don't get around that we need to get the users first. Then, for each user, we do another query to get the posts. So we have query two three four. And then again, for each post, we need to do another query to query the comments. And this is exactly what the N+1 or I also like to call it, the 1+N problem it's exactly what this is about. We need to do a lot of queries. And the problem here is that we're putting a lot of strain on the database. And this is actually exploding exponentially by the hierarchy, the number of levels we have in our queries. So if we would, for example, have 10 levels in the query. Now we have three we have user post comments, but we could also have 10 levels, then it would be N to the power of 10 queries that we need to do. And if N is 10, obviously this doesn't work like this doesn't scale at all. And we will just shoot down our database. And through this naive implementation that I've just been showing you here, it's very slow. And it's only I benchmarked it locally, it's a MacBook Pro 15-inch 2018. I benchmarked it. And the truth is, it can only handle two and a half requests per second, which is nothing. So that would mean if we have in our life system, three concurrent requests from the users, we already add capacity. That's obviously not what we want. So this is where we slow and let's optimize this now. And in order to optimize this, I will show you first of all, just without doing any magic without doing any work. How the same for resolvers with the same GraphQL server will look like if we use Prisma. So again, we have exactly the same, GraphQL server around this, the same type system etc. But we now instead of using SQL, we implement the resolvers using Prisma. And again, we first get all the users that we need. And then we get the post for the users. For each user, we request that user and we get the posts. And again, the same deal here, we get the comments for each post. And this is already much more readable as you can see, this is typesafe. That means if I, for example, have a typo, and I call it we're, instead of where the compiler directly helps us and tells us Did you mean to write where, obviously, this is one of the main benefits of Prisma and obviously, this types, if you don't get an SQL, nothing's happening if I just remove a random character here. Again, we didn't do anything special here, we just replaced the SQL code with Prisma Client code. And as you see, it's much more readable, and also much more flexible, we can easily put another chain behind that and it gives us the data that we need. And with the chain, I mean, we can say, give me the user, and then dot post, we can have this chaining API to give back the post for that particular user. So okay, so how can we now fix this problem? we now are aware that this is where it's low, and we would like to get rid of this and we would like to optimize this. Now how can we make that happen? And in order to try to optimize that, we first need to understand what is even happening under the hood. And what is happening is this is just pseudocode. This is not the real code, but this is just here to illustrate to you what's going on. So again, we're first getting all the users, then we're looping through all the users and fetch the posts for all the users. And we do a promise all the way promise at all. It's a very important ingredient the promise all we will later come back to it. And again, we now here get all the posts. And once we have all the posts, we say, okay, for all the posts looped through the posts, with a map, give me all the comments and again away promise all which is nothing else than waiting for all the promises to be resolved. And if one promise rejects, the whole thing rejects the whole thing else. This is basically what is happening under the hood. And this is basically also if you if we now translate that to our graph that we have. This is also how this is now working on our graph here, we say, okay, all the user queries, we do we basically have a promise all note all around them. And again, with all the post queries, we have a promise all. And you probably already can guess it. The idea now is to batch all of these very related queries, they basically all look the same. They just query for different id for different user, we can batch these rows together and optimize them. That's the idea. Now, and what would be beautiful if we could hook in exactly in the moment in time when the promise dot all is being awaited again here this point in time here, if we could hook in then and just say, okay, thank you for all the queries, I will not take them, and I will squash them together, I will optimize them somehow, and send an optimized query to the database instead of sending N queries, we would like to send one query. And what if I told you that process nextTick is exactly that. This moment in time when we are awaiting the promise all when we're waiting for all the user requests, that's exactly when the process nextTick kicks in. And you can basically imagine this, as if we are basically cutting like, let's say we have this long stripe of incoming requests, yeah, requests after requests on the timeline, left to right. Then we you can basically imagine the next take to basically cut out the piece the request that we need. And here you already see the actual query that we will do instead, we will use a user id in query. So instead of requesting three different users and getting the posts which would result in three different SQL queries, we are just doing one query and we say give me a the posts where the user id is in this list. And that way we can squash all of these queries together. Again, the post has a column called user id that points to the user table. And again, the same we can do with all the comments that we want to have for the posts. Yeah, we can again wait and hook into the next process nextTick. And exactly that moment, the GraphQL.js. Implementation is waiting for all the resolvers to return to resolve. And we can just say okay, exactly in that moment, we will kick in, we take all of these queries, we batch them together, and then optimize them into this one query called comments find many. And then we get all the comments that are related to any post in that list of posts. And this is basically it. And earlier I showed you that we could just easily replace the SQL queries with Prisma. But now how can we turn on this magic process nextTick optimization thing. And by the way, this magic process nextTick optimization thing is also called the DataLoader pattern. And the DataLoader pattern has been popularized by Facebook, who also initially came up with, with GraphQL. So let's check this out how this would look like. Again, we have this implementation here. We didn't do anything. We didn't write any, like process nextTick, and whatever we don't even know. We don't even have to know the content of the DataLoader. And let's now start this server. And let's see what kind of queries are being generated with this server. So again, we have been running on if implementation, now we will go for the implementation using the find one in the resolvers. So let's look into that. So again, we have the same GraphQL playground here, we execute the query. Okay? Let's look into the locks. Okay. As you'll see, we don't have like 100 queries, but just a few. And let's look into them. The first query that you see is the lx, blah, blah, blah, id email name from public user. This is basically just giving us back the user so by the way, these are all the queries locked by Prisma that Prisma now is doing. Then we get the ids we need them to do our resolution. Then we already get all the posts. Now, again, we use the id entry. This is exactly this is not only what we could write in the personal client syntax, this is actually directly translating into the SQL syntax with the in operation. So this is something that the database can directly understand. It doesn't have to be back and forth with 500 queries and whatever. This is directly coming back. This is directly something that the SQL database, in our case Postgres can resolve. And as you see, we just turned like the hundred queries into like, what is it five queries. And this already makes things much faster. So the left implementation is able to handle my machine, two and a half requests per second. The implementation on the right hand side already can resolve about 12 queries per second, which doesn't sound like a lot. But again, remember, we are requesting 10 10 10 that means 1000 items. And the response JSON is actually 240 kilobytes. So it's already something, it's still not like 12. Still not a lot but again, we already got from two to five to 12, which is a huge improvement. And what if I told you that we can even go faster. So the thing with this implementation here is steps. There's already quite awesome. And this is also the recommended way, by the way, but there is a lot of back and forth between no chance and rust. So that means no chance we do the batching. It's like a hybrid approach. We do the batching and no chance, but we do the optimization in rust. And why do I suddenly talk about rust here, in case you are not aware of it? The core query engine for Prisma is written in rust to make the database into action where we fast and we have a thin layer of TypeScript top off that last thought. And again, this back and forth obviously takes time. And what if I told you, we can get rid of the back and fourth, and we can. For the people who wanna really squeeze out the last bit of performance, we have something wrong. We're not we but actually have met a lever, someone from our community implemented, beautiful plugin. This from the community it's not something we officially support. But if you're really interested in squeezing out the last bit of performance, then you can have a look into this. And so how does this look like? Again, we have the everything is the same how like we use Apollo server, we use Prisma. But what we do now, we directly take the query as it is and pass it in, and we transform it into something that Prisma Client understands, and we pass it into Prisma Client, we basically delegate Prisma Clients to handle all of this resolution. So instead of basically relying on the GraphQL.js engine, resolving all of this data, we are asking Prisma Client to do this. And for the people who have used Prisma Client before, may know that, you can say something like, give me all give me users. And you know, you now can say select. I want, for example, the id. But I also wanna have the posts that is for example, something I could do here. And now let's look into that. So we say, for example, resolve this. We can see here now that the response time of this query is actually attacked. And this is exactly what this plugin from Ahmed is doing under the hood. It's taking the GraphQL query and it converts it into this select syntax how we call it off the Prisma Client. So that Prisma Client, and therefore also the query engine can directly take care of the resolution. And again, this one, we can also just run it for fun. This one also, obviously is the same deal as we had with the others, we can this bomb. And let's look into what is happening. We even have less queries, this one is even more optimized Prisma in particular. So what we see is what we want to see, we see that only three queries are being done. Yeah, again, we have three levels, we say user post comments, and we are only doing three queries. Again, coming from this that does a select select select select. Back to this, which is only three queries is obviously a big improvement. And so how fast is this particular one, this one is able if you also use a faster GraphQL server, it's able to deal with 50 requests per second. And suddenly, this is something that we can talk about. So let's look a little bit more into numbers. Here I did some benchmarks. And one interesting thing here is that people oftentimes think, "Oh, no, I need to have a fast server." Express is not fast enough or something like that. Yes, you can use a faster server, for example, the fast file implementation of GraphQL is a bit faster. And as you see here, with the same set of just gives us 10%. It's nice, but doesn't give us that much. The change, however, from going with this very inefficient implementation, where we do N+1 queries all the time per level of hierarchy, versus doing the DataLoader optimized approach gives us a lot and if we now see going from the tune of six to 50. Yeah, we get 50 requests per second there, we get a 20 x improvement in speed. And this is there's no caching yet activated nothing. This is the same stuff, we just use Prisma. And fast file, which is very easy to use. And now as you can not unsee, in the first row, we have this enormous number 2725 queries per second. But here I just wanted to mention that oftentimes, people completely forget that you can do caching in GraphQL. Caching GraphQL is tricky, because you would need a cache probably per resolve and so on that was out there. However, and depending on your use case, caching can really make a lot of sense. Let's say you need to implement something like Instagram timeline, where you say, okay, it's totally fine for the day stale for five minutes, then you can cache this data, even in the CDN. And Kevin can have it blazing fast. And in this case, I just wanted to show you this is basically the pure, like network overhead, the pure okay, how fast can we shoot our data out? In this particular setup, we can shoot at 656 megabytes per second out if the query is cached if the result is cached. In other words, it will not even hit the database. So this is interesting. Also to look into, it's a huge topic. You can work with eTags and whatnot. But just FYI, it's not just solving the N+1 problem. It is the biggest bang for the buck when you have the non-cached version. But caching can also be interesting. All the benchmarks are made many more benchmarks, you can check it out on my GitHub, timsuchanek/N+1. And that's basically it. Thanks for your attention. - Tim, thank you so much. That was really insightful. - Yeah thanks. I hope you enjoyed it. - Yeah, I mean, with GraphQL becoming the thing these days. Seems like we've got a lot in the in store right. - Yeah I mean, a lot is happening there. It's quite exciting to see I have the feeling every second day, there's a new library on the top front page of Hacker News about GraphQL. And also, while GraphQL is already quite old, in the JavaScript ecosystem, I have the feeling people are already looking for different approaches, sometimes. There's still a lot to solve. And it's still a non trivial problem to fetch nested data. - Yeah, so maybe you're ready to take some questions. - Yep. - So I mean, the first question that's coming up is what is GraphQL-JIT? is JIT just in time? Just that. - Yeah, that's interesting question. So the GraphQL-JIT is actually a project that comes from zalando. Who are in Berlin. And I remember last year, we sat down here in Berlin. And in Mitte, we had lunch with some of those zalando developers. So for zalando, the main thing is performance. And so what they wanted is to increase the performance of the GraphQL implementation. And we talked about an approach that we saw in the Python world where the leading framework called Graphene they were working on an approach for that. I think it was not called Graphene that new thing. But the idea was basically, that you are completely compiling down the function that is already specialized, so to say on a specific query. So what you need to be aware of is if you have a GraphQL.js server, you are basically always let's say 20,000 scalars come back or in a response you're calling 20,000 functions. And that takes effort. And there are issues in GraphQL.js-- - Why are there 20,000 function calls, if I may ask. - So that is the case, because the whole idea of GraphQL is that you are tying together multiple data sources, which you cannot trust. In the case of Prisma Client, it's different, because we know Prisma Client is typesafe, etc. But in the case of Facebook, for example, who came up with GraphQL. They for example, set by default, everything is optional. And you have to explicitly say that something is required. So they have the opposite paradigm in that regard. And so for them, it's rather okay. I cannot trust the data source. I don't know maybe it doesn't even return anything. So that actually checks for all the data to come back, that's why they check okay, is it not this required string does the data actually come back. And that's why they need these 20,000 function goals if you have 20,000 scalars. And now the idea was, so we told zalando people, hey, maybe you wanna look into what they're doing in the Python ecosystem. And actually, they did that they looked into it and came up with this library called GraphQL.JIT. And the ideas following, they take your GraphQL schema, they take your query, and they turn the query into a little bit something like, prepared statement and database speak. That means the whole query is already past. And the resolver function is already completely generated. So what they are doing is they're using the new function, keyword and in JavaScript, in JavaScript, you can do fancy stuff. You can, for example, in runtime, create a new function. That can then be instantiated. We're not talking about the eso evil statement, we're talking about new function, you can in runtime, instantiate new code. - So you essentially pass a string, which is like the function, the code for the function definition. - And what GraphQL.JIT is basically doing, it's looking into your data it's looking into your resolver. And it's in runtime generating this function. And now the how is it solving these 20,000 function calls. It's basically you can also check out the GraphQL.JIT website they have like an online compiler to see what's happening. But what they're basically doing they're generating one huge ternary statement, and ternary statement is faster than a lot of function calls and that's how they made it fast. Yeah. - Thank you. That was really, really cool to hear. So I mean, speaking of performance, what are some of the performance improvement suggestions that you would have for deployment to something like AWS Lambda, I mean, we had a lot of in the posts, we saw many of the attendees are deploying to AWS and Lambda and function as a service providers and I mean all the serverless platforms. And then obviously the question comes up, I mean, whether you're with GraphQL or just vanilla Prisma, what are some of the optimizations that you would do there? - Yeah so what I talked mostly about in my talk was about in within your application, what can you do there. But once you deploy this into production, networking is or is the main where you get most main bang for the buck. And one common mistake I saw quite often when when writing with people in Slack, etc. If the database is in this different region than where your function runs, that already makes a huge difference, what you need to be aware of with Prisma we are not doing joins. That means if you're giving us a huge massive query, Prisma is never doing any join. But we are as I said in the talk doing the DataLoader pattern. That means we get the first hierarchy. Back, then we do the join in memory in rust do the second high level entity and so on. So that means the distance between your rust and your between Prisma Client and your database is really essential here. The lower the distance, the quicker the query here. And another thing I have to mention here is that this approach that the big advantage of this is, we are not putting too much strain on the database because we are doing the joints and rust. And this is a pattern that you see bigger companies doing for example, Facebook is using MySQL, but yeah, they are using MySQL mostly as a key value store. That means they put all the heavy logic into application level, because application layer, the application layer, you can scale horizontally, much easier. And that's an interesting pattern here that we are using and so the distance between the database and Lambda function is very important. The other one is what everyone is talking about cold start. the cold start of a Lambda function is a thing. To start everything and to build the initial database connection takes time. Therefore, we recommend to instantiate your Prisma Client outside of the function. Therefore it is still the connection is basically still running even though the Lambda function is frozen. Even though the Lambda function doesn't run, the TCP connection is still there. database and Prisma I think they are still connected, unless there's a keepalive package which invalidates it, but basically you can reuse connections and if you if you do that, if you put your Prisma Client outside of the function, and that way, you will also gain a lot of performance. - Do you happen to know? Do you happen to know how long a container of a function is kept around after a single call? So if I make a call to a serverless function with Prisma, how long is that connection kept alive for before AWS drops it? That's a tricky question, because nobody really knows it. I can post a really interesting comparison link or benchmark in the channel later and look it up. Because there is a continuous serverless comparison website. Because there's not only Lambda, obviously, there are many approaches. And it highly depends on their intelligent scheduling algorithm basically. So it can be a day, I think a day is roughly the time if there was no further requests to that Lambda function, but it's highly depends on their schedule. - Gotcha. Well, Tim, it's been really wonderful having you speak here. Super insightful talk and you'll be around in the Slack channel is that right? - Yes, I will be around so if you have any more questions you can ask. - All right, all the best Tim. - Okay, see ya.