Video details

Local Development for Serverless is an Anti-pattern - Gareth McCumskey

Serverless
11.06.2021
English

ServerlessDays Paris - https://paris.serverlessdays.io
In the serverless community, individuals and teams spend a lot of time and effort attempting to build an environment that is a replica of the cloud. Why? Because this is what we have always done. I am going to argue that this not only unnecessary in the serverless world but actually harmful.
About Gareth
A web developer of nearly 20 years; from humble beginnings doing a lot of work in multiple teams around South Africa and ending up in Portugal with his family, Gareth has seen the industry shift repeatedly from one paradigm to the next. After discovering the world of Serverless Gareth now finds himself lucky enough to work with the team at Serverless Inc, custodians of the open source Serverless Framework, as a Customer Success Engineer and Developer Advocate. On the off days he spends as much as he can with his wife and children.

Transcript

All right. So obviously I'm here today to talk about local development in Serverless. My name is Gareth Mccumste, in case you didn't catch the intro, I'm a customer success engineer at Serverless Inc. And basically my job is to to talk to people about serverless and help them find solutions to issues they may have help them use serverless tools, serverless framework, and so on. Obviously, as with all these talks, everything we talk about is due to our own personal experiences. Over the years we've been doing what we do, just like my talkers as well. And timing this talk over the last couple of weeks has been a bit tricky for me. I always found it out about 45 minutes, and I know we don't have that much time, so I've been able to strip it down a bit, but we kind of need to get going. So I'm sorry if I speak a bit quickly at times. There's just a lot to cover. So let's get going then, to start off with what I want to do is sort of set the scene with local development and what I mean by local development, where it's come from and why it exists. Why do we do this thing called local development in the first place? And really, it comes down to the fact that we want to have our machine act like production so that we can test our code. I need to write this code. I need to test this code, and I need somewhere to make sure that it works. The problem is that traditionally in the past, it's just been too expensive for every developer and a team to have an exact copy of production sitting around just for them to use. That would be the perfect ideal is every developer had an exact copy of a production system right in front of them to work with. But traditionally, you can't have a huge cluster of machines load balanced with MySQL databases and a master slave set up and all this complex infrastructure that production systems require before serverless. And also, obviously, it's too risky to make these changes in production. If you can't have production systems to use yourself, the other possible solution is to just do it in production, but that gets a little bit risky. I've been there. I've done that and it doesn't work very well. So what we end up doing as a way to help solve this problem is we try to as best as possible copy the production environment on our local machines for local development, and this is to solve all the problems. I've said it reduces the cost, and it makes us that we can actually test this code to some degree, and we can then even have this is very common as well in a lot of teams is that not only does each developer have some kind of local environment to play around with, they usually even have a development server of some kind some kind of integration or staging system that you can take all your code and all the other members of your team. Stick it onto the single machine that is even closer to what production system looks like to test with, iron out any problems and so on before you actually push this out into a production system. And this is where we often come up with. Here the mantra. It's become a meme in the development space that it worked on my machine. I've developed things locally, it all works fine. I throw it over the wall at the Ops team and it breaks. Not my problem. It worked on my machine, and because of our ability to try to get things to run locally, we were then able to go back and say, Well, we're doing this web stuff that we would normally have to do on a web server on our local machines. What can we now use from other developers that have built things locally, like desktop developers and debug tools, for example, where you can step through code and breakpoints and so on is a pretty Nifty feature to add. On top of this, just another advantage to being able to work locally. We've then gone ahead and to try and get ourselves even closer to production systems, use tools like Vagrant and VMs and so on to just replicate those production systems. And obviously today Docker kind of has taken over that almost entirely. And development teams are now replicating production systems using Docker containers, even in the serverless world. So just to sum up about what local development means and where it's come from is that essentially production is just too complex or expensive for us to replicate. So we've tried really hard with virtualization to copy all of this production stuff on our local machines. And it also allows us to get a few additional optimizations to work locally. But it still requires a step between local and production. So no matter how hard we've tried to do local, we often have a development staging system between our locals and production anywhere. And that's kind of what local looks like even before serverless. And it's kind of become an industry standard I've been part of and joined so many teams where essentially either you're joining a team and working with everybody involved to try to find ways to build a great local development experience for your team or join a team where you spent a day, a week, maybe longer, sometimes depending on the complexities, to get that simulator development environment running on your own computer so that you can start contributing to the team. And this has become a necessity. This isn't something that was created in isolation and created burden. It was an absolute necessity prior to serving us to have these kinds of local development systems. Because of all the problems I mentioned before, we really have no choice. They have helped us to develop faster and create things and eventually when we want to push out to production, most of the work is done, but there is going to still be integration work to be done. But at least local development environments in those situations have actually helped us. But how does serverless come along and change some of this? So that's the interesting thing. Now we're moving into the serverless world. We're no longer bound by the same rules. First of all, I tend to look at this from a web application development scenario because that's kind of what I've done most in the past. So that's why my example is a bit sort of web app heavy, but that kind of works with any use case that you might think of. So first of all, the commonality between local systems and production is often the web server. You can have an Apache or an NGINX web server on your production machines, and that's pretty easy to replicate locally. But with the serverless world, we don't really have a web server that we can just hook into and understand how it works and copy it in local and production. We're going to be using a tool like API Gateway, which is more than just a web server. Sure, it receives Http requests, but it handles things differently than a regular web server would and potentially adds additional features on top. Some of those would be things like routing. This is an example of a larvae root file in PHP, and I didn't know I was going to get so much PHP stuff happening in the conference today, but here we go. And this is a relatively simple one. Actually, this is one of the examples I kind of pulled off of Google Images very, very simple example of a Laravel roots file, but this is the kind of stuff that traditionally you'd have to write maintain in your code itself. It becomes a part of the compute of your application. It's not just part of the Apache infrastructure that you're going to spin out. And this means you've got additional stuff to maintain. You have your frameworks that you need to update and maintain. You've got more code to edit. So now something like API Gateway takes us work away from us. We don't need to manage routing as code inside our application. Api gateway gives us a way to manage that as part of configuration instead. And this continues with things like databases. This is an example of an entity class for a doctrine Orm inside Symphony. And again, this is a very simple example, and these get hellishly complex. I know from experience, but the idea with this as well is that now I have to maintain my data models and my entity relationships in the code I'm writing. I have to produce this code, have it automatically generated in some way, have a framework to maintain and deploy and implement, and so on. So this adds complexity into my computer layer instead of being part of the infrastructure layer that configuration with serverless tends to give us and it goes even further. This is a Cron job. What about Cron jobs? These are those horrible things that you kind of think of later that you need to implement in your infrastructure usually ends up being a T two micro instance sitting in an AWS cluster somewhere on its own trying to do Cron jobs. Again, this is something we can just do with serverless. Now it just becomes part of our infrastructure to configure our Cron jobs. We don't have to worry about implementing this stuff and maintaining it ourselves. And an example of what I mean is if I want to set up some routes with API gateway, this is obviously in the serverless framework, but other serverless tools work very similarly where you're configuring your endpoints and your path directly in the configuration of the resource that's going to be receiving NHCP requests. So you're not managing this in code. This is not something that you can now go update and maintain later. You might have to change the configuration for different parts, but that's the extent of the work. Setting up databases can be just as simple with DynamoDB. Now I have essentially a key value store that has no fixed schema to it so I could store the data I need to. The only complexity of DynamoDB was that only the querying aspect and there's huge amounts of information about that. We even had some of the speakers mentioning DynamoDB. But what's cool about DynamoDB is it helps me get up and running with a database or some way to store data really quickly and easily, and I can manage that through configuration primarily. I don't have to spend time and effort with Orms and entity classes and so on, and it runs the gamut. There are multitudes of AWS services that end up lifting out a lot of the code that we would normally write to do things like SNS, for example, which is a Pub sub service. So no longer do I need to worry about trying to orchestrate pub sub with a framework or a library. I've installed it's a service. I consume AWS SDK, push the data edit and it does pub sub for me. I don't need to think about it SQS similar thing. I can just push stuff into message queue. I don't need to maintain a library that manages message queues. For me. I could just push stuff into using the SD card. There's step functions as well. State machines used to be a process of finding the right framework or library to include in your application and configure it in a certain way. Set things up so that you could have a state machine running in a specific way. Now I have state machine machines as a service. All I need to worry about is the actual code that runs in the States themselves. I don't need to maintain all the additional work around managing state machines and cloud watch. We talked about Cron jobs before, so there you go. I've got Cloud Watch now, cloud Watch has the capability to have scheduled events triggered, and that's what manages our Cron jobs for us. I don't need to have that tiny little T two micro instance that everybody forgets about that's running our Cron jobs for us. So what does that mean? I'm putting all the stuff together talking about these services that consume all of our code, and what it means for us is that we end up in the long term with serverless having less dependence on our own code. Previously, as I showed, we'd have to include lots of code to run things that are now taken over by services because of the reduction of code. Apparently, the more code you add, the more chance you have to have bugs. Apparently, bugs only happen when you add code. So if we aren't less of it, that means we got less chance of having bugs. That's a very nice benefit to have by having code removed by services. It also means that now that I have less code, that also means I have less code that I actually need to worry about executing my own machine. A lot of these services just do that for me. So there's less for me to actually test now. And the other thing that often slips by is that cloud services have now become integral to our application. We're no longer writing code that I can take from one CPU and dump it on another CPU, and it'll run because that's the extent of the infrastructure that we've coded with it. The cloud services become integral to it, and we have to have them available to us. You can't just lift these out without the cloud services around them and use them anymore. You need them. So we've looked at what local development means, where it's come from and looked at what serverless has added to traditional development in helping making testing locally easier to some degree. And I wanted to go through a lot of the current sort of methods that I've seen that I've used myself and that I've seen other folks using as well to try and do testing of service applications on a local machine. How can I execute or test my serverless stuff here and now and not have to worry about using the cloud? So the easiest one that I can think of is with the surveillance framework just to run the serverless invoke local command. It's the quickest and easiest way just to execute any old code setting in a Lambda function that you've coded on your local machine. You run the command simplest invoke local F in the function name, and it's literally just going to execute the code. It's not without its issues, because access to AWS services becomes tricky. Obviously, if cloud services are integral to your serverless application, you're probably consuming DynamoDB S, three SNS SQS and many others inside your Lambda function itself. You're calling out to AWS. So the serverless invoke local command, it's not really going to work for that. It becomes a bit tricky to configure and set that up, so that might not be the best solution to that. Also, unfortunately, it does not accurately emulate the conditions of the Lambda environment doesn't quite emulate what the environment is like when a Lambda function actually executes. So let's just move on to the next one. This is a very, very common use case. I've seen as well with Serverless Offline. Serverless offline is a relatively simple to implement plugin to the serverless framework. And really what it does is it gives you the ability to support Http services in your service framework service. But that's really all it does. If all you're building is a bunch of API endpoints, you might be okay, then this might be useful to you, but from experience I've seen most teams, including myself. I've started out with an API because that seems a very obvious use case. It's very powerful for that. Let's build that thing. But now I realize I want to support image uploads and that involves an S three event. Now if I include an S three event, serverless offline no longer helps me test that, that becomes a lot trickier to test. Now SNS, I want to do some SNS stuff, but that's also tricky now because Servers offline doesn't quite support that. There's no support for the AWS events, and for local testing, you also will need something like local stack, which I'm going to take a look at in a second because you need some way to have these AWS services that are being called inside your Lambda functions. Again to execute in some way, you can't just have your Lambda functions error out because they can't connect to your AWS account. So you need some way to be able to run these services at least. And it also does not accurately emulate the conditions of the Lambda environment that the Lambda function will be executing because it's your local machine. It's not actually Lambda. So about that emulation, how do we do this? I mean, I have these services sitting in AWS. I want to use them in my Lambda function, but if I really want to test locally, I need to find some way. My Lambda functions don't just error out every time I execute them locally because they can't connect to AWS. So one method is to emulate these services, and this is usually done with a really powerful piece of software called Local stack. And there's, even as you can see, a plugin for the service framework to how to use local stack. And really what local stack is, if you haven't heard of it is a package that essentially runs either versions of AWS services that have local variants provided by AWS or close equivalents to those services to kind of emulate them as best as possible. Unfortunately, local stack doesn't come with ad's own issues, one of them being that it can get a bit complex to configure at times, and this is actually a relatively simple configuration file that I was able to find where somebody is trying to set up those proxies and redirect to the local stack equivalent of AWS services. It's not absolutely monstrous, so it's not terrible, but you do have some complexity that you need to manage her now, unfortunately as well. Now that you're using local stack, you've got a bunch of Docker containers, including the services running within them that you need to manage as a part of running local stack. And the local stack team works very hard to make this very easy to use, and it's also not me ragging on the team at local stack. This is an indication of the issues that happen with local developments in general. In my past, I've worked as I said, I worked with teams where we try to build local development environments, but they become longstanding, difficult things to maintain over time that you have spent a lot of time and effort building. Maintaining improving. As production changes, you need to add more stuff to it and update everybody. And you'll see that effort put into managing a local stack as well as part of your development environment. It can get tricky to keep it maintained, updated and running over time for all developers in your team and can cause a few issues and delays there. Not to mention that you've now got a whole bunch of processes sometimes consuming a lot of resources on your developer's machines, so it might not be the best for that as well, especially if you're trying to run on lean little machines in your development team. This can consumer a lot of resources. Unfortunately, it doesn't stop there. Unfortunately, while the team of local stack works as hard as they can to make what they provide as close to AWS as possible, it's not quite there. Local stack is just not equivalent to AWS, and that's true of any local development environment. It's never going to be as good as the production environment, and local stack isn't any different, so end up in a situation where you have a choice to make between two choices. You realize that you have a service in AWS that is absolutely awesome. If you can use the service, it's going to completely change the way your app performs. It's going to make things better. It's going to make your life easier as a developer and you want to use it yesterday, but localstec doesn't support it, and this has happened to me a couple of times as well. You decide you need to use this service in AWS, but you can't use it in local Stack because it's not supported yet. I'm sure we saw with some of the other talks. Aws has a whole bunch of services. They've got a ton of stuff for you to use. So the local stack team is still trying to add more and more of the services from AWS into the package, and they're just not there with everything yet. It's a moving target. There's always new stuff happening, so it's more than likely going to happen. At some point there's going to be a service you want that you can't use in local stack. So the decision is you can decide just not to use that service. So your app is going to suffer for not using something that would make it better or the alternative use it anyway. The problem with that is that now you essentially broken your local development environment because your tool that you've been using for that no longer doesn't support that service. So you caught between a rock and a hard place. Do you want to keep your app less optimal, or do you want to break local development? It's a tough decision to make and also just executing that code that you've written for your Lambda functions. The environment that it's executing in doesn't quite emulate the Lambda environment very accurately. So in my first team where I started building back in 2016, I was part of a team trying to transition away from a PHP monolith to a serverless microservices architecture, so I'm familiar there's been to talk about it, and we ended up going through a process of trying to find a way to develop locally and went through all the steps I've been through right now to the point where we were looking at mocking those services. And I was an advocate for mocking AWS services for a very long time. He wrote a blog post for Serverless about how to do this and why it's awesome. One of the reasons is it's a very easy way to set up testing for AWS services. There's a really cool NPM module called AWS SDK mock that you can essentially use, and it just captures all requests to the AWS SDK and lets you insert your own response to it. So instead of going to the cloud, it stays local and you can respond however you want a failure message, a success message, whatever it is from that AWS service. It also requires no complex configuration. There's really nothing complex about getting mocking setting up in your service. You probably add another three lines to the configuration, however, and there's also no complex installation of applications. So it's not like local stack where you've got this huge pile of apps that need to be spun up in the background or a massive development environment that any new developer has got to get somehow out of the code, repo and spin up on the local machine and try to execute it has none of that. It's incredibly light and easy to run because it only consumes resources when you're actually executing your Lambda functions locally, and there's no maintenance. There isn't a bunch of stuff that needs to be upgraded over time, and every developer has to follow suit and falls and updates and so on. Very low maintenance. And also it doesn't limit the services. It just ties into the AWS SDK. If it's in the SDK, you can use this AWS SDK mock tool to mock it. It's not specific. However, there are some downsides. I'm not going to talk about the usual downsides most folks talk about with mocks. One of the big ones for me is that mocks lack local stack, but it's not equivalent to AWS. And this is briefly mentioned in some other talks as well. While you can mock the services out and keep things local, your responses will probably not be what AWS should give you. And even if they are, even if you're able to accurately produce the same responses that AWS does, this is not sitting in a data center communicating over networks. You don't have the same network latencies involved. You don't have the same IAM interactions involved. There's a lot of other stuff that happens around these AWS services that mocks and even locust are just not going to give you. Setting up the mocks can also be an incredibly timeconsuming process. The idea here is that if you actually want to mock properly, you need to find every possible response that a service will give you to test it properly, and that's a lot of potential responses and a lot of potential responses that you need to code in the actual content of, so it can be incredibly time consuming added to that. If you're not already doing unit testing as a part of this local environment, you pretty much have to do unit tests to get this to work. You need some way to execute the Lambda code locally, and using a tool like mocker or Jest is a really great way to execute unit tests locally, and the Mocking tool ties into that because you can initialize the Mocking tool right up front with your tests, send data to the Lambda functions, and have the mocks capture the requests and respond to them. This is how the unit tests help you do that anything. Even if you don't use Mock or Jest, you're pretty much going to be doing unit tests at that point just to get your mocks to be up and running. Also, it doesn't quite emulate the Lambda environment accurately enough, so it's still running on your local machine. It doesn't actually quite emulate Lambda, so moving on to another approach, which is very popular and we even use it as services. This is a hybrid approach, and the whole idea here is that I may run my code locally, but instead of trying to emulate or mock everything locally, I just use the cloud services themselves so that the services are actually in the cloud. I'm calling actual DynamoDB actual S three actual whatever else that I need to, and in this way, I'm avoiding that issue with inconsistencies between Emulation and mocks so instead of doing that, I'm testing the actual services. It's a good step in the right direction. There's no need to emulate services, which is fantastic. You don't bog down your own machine, there's no complex maintenance that needs to be done, and the code is still running locally, so you can still use those local debugging tools that I mentioned in the beginning, which is kind of an FDE win. And at this point most people might be saying, but isn't that good enough? Isn't this hybrid approach good enough for us to be able to develop our serverless applications? Obviously, my response is not quite. Even with a hybrid approach, there's still some kind of special configuration that you're going to need to do to allow the code executing locally to point at those remote resources. Just running the Lambda function locally. It doesn't know where the AWS account is with these AWS services that it needs to connect to to use as a part of while you're building things. And often this results in a development environment only configuration that ends up going into production as well. You end up deploying stuff that's meant to be for development only into production, which to me, is a bit of a code smell. Some people might find this acceptable. That is a decision you need to make. I typically don't like the idea of taking stuff that I would use for my local development environment and putting it in production, even if I've coded it in a way that that stuff should never be used in production. I always worry about what if it is so, yeah, that's just a potential downfall of that. And of course, it does not emulate the Lambda environment accurately enough. So let me broach the elephant in the room. I'm sure you've noticed me say this a few times. It does not emulate the Lambda environment accurately enough, and this is ultimately one of the deciding factors for me. When I'm looking at building my development environment, I typically want my ability to develop and tests to be as close to production as possible, and this keeps coming up. Besides the fact that Lambda, the environment that Lambda executes in versus the environment that your local code exists in will potentially have different libraries installed. There's been many times I've written Lambda functions locally and using, for example, encryption tools, and it runs on a local machine. There's no problem with that. As soon as I push it into Lambda, I suddenly have errors coming up with the missing file here or there or wherever, because I cannot find these libraries that I just assume will be there because it worked on my machine. Those can obviously be resolved. You could just change out from one library to a different JavaScript module or whatever it is. But ultimately you're still producing things locally that don't quite emulate what is actually running in Lambda, and that means at this point you are deploying into Lambda getting in the error. You now need to make a change locally that you can't replicate. Unless you go delete libraries off your computer, you have to now push us back into Lambda, and now you're doing remote development anyway, just to solve this one problem. The other side of this is that the way code executes in Lambda is very different to how it does on your local machine. Even in a Docker container. Lambda has a habit of shutting down as soon as possible. It's part of why we love it. It saves us money. It means we don't have this thing processing 24/7, costing us $5 minimum a month to sit around and do nothing. Lambda will execute your code, and this is probably more prevalent in something like node in JavaScript than it is in the other runtimes, but it does still occur in the other runtimes, too, especially with the JavaScript call stack. What often happens is that you have a request, a promise is generated, a callback is generated that gets shoved onto the call stack in JavaScript, and this is just an example. By the way, you make a call to a database that gets stuck in the call stack. Your code continues to execute, it ends, it sends a return, and in your local machine it means that that stuff in the call stack gets completed. That request to a database gets completed, SNS gets called, emails, get sent, everything else that was queued now gets done in Lambda as soon as your function reaches the end and returns, everything in that call stack is dropped sometimes, and occasionally I've seen a warm Lambda that had stuff in the call stack previously be reinvoked, and suddenly this stuff is executed again or before the actual code runs. So you're in a situation where you have some stuff that you want to execute doesn't, but sometimes does, and you're never quite sure how and where this occurs. This has happened to me multiple times. I've had calls go to Services APIs, third party integrations. You name it and I run things locally, and it's great. I push it into Lambda and it just doesn't work. And then I'll spend the next hour and a half because there's no meaningful way to get feedback about why things are failing. Unless you know about this potential pitfall. The problem is that there might be other potential pitfalls we don't know about, but because we're stuck on a local machine, we can't quite see that unless we're testing in Lambda. Yes, because of the JavaScript callback anyway, and testing surveillance applications. Now, thankfully, there was a very nice talk about that. So I'm not going to go into detail about actually doing testing service applications. I'm just going to go through a bit of a summary of how this can be done. So one thing to bear in mind is that okay, I'm being chased. Okay, let me do this faster. Okay, so we have the barriers to testing locally that we had in the past because of production being too expensive are now gone. It was too expensive to give every developer a replica. We can do that now because every developer can deploy to the cloud for free or very little. If you've worked with Serverless framework, serverless, deploy stage stage name or give every developer there in AWS account, it's cheap. It's easy. It's actually free. You can do with AWS organizations. Okay, civilus is cloud native, and this is something we need to bear in mind. Building our apps locally isn't necessarily practical anymore. We are consuming the cloud and to make the best service application we can. We should be trying to consume as much of the cloud services we have available to us as possible because we don't want to manage, maintain and work with all this infrastructure and stuff that we have in the past. We want to give that to our cloud provider and say, you sort that stuff out. I want to sort of business logic and that's all I care about. So the fact that we're cloud native means that we should probably be looking at developing more in the cloud and that's it. Am I? Okay. Thanks, guys.