Video details

Integration Testing Serverless Architectures, EventBridge and Beyond... - Sarah Hamilton

Serverless
11.06.2021
English

ServerlessDays Paris - https://paris.serverlessdays.io
Event-driven architectures using Amazon’s EventBridge allow us to create highly scalable, loosely coupled systems. However, testing these architectures can be a great challenge. In this session, we'll explore different testing options in order to have confidence in our distributed systems!
About Sarah
Sarah is a cloud developer at web and mobile development agency - Theodo. She specialises in Serverless, with a keen interest in EventBridge, and particularly enjoys working with clients who want to scale their products to more users. From game-tech to beauty-tech, she loves the challenges that each project brings. In 2020, she scaled a video conferencing product that needed to grow quickly due to the impact that Covid-19 had on people’s work.

Transcript

We're off to a good start. Yeah. So today I'm going to be talking about integration, testing, service architectures. So it's actually really great to go after Julian, because I'm going to be really zoning in on an event bridge. Now he's given us all a good introduction to message, so a little bit of intro to me. So I'm a developer at Theodore. I'm sure we've all heard about that. Now. I've been a developer there for about two years, and over the time I've worked with a range of clients, varying from one week to about six months and touched many different tech stacks. But it became apparent quite quickly that I really enjoyed working with serverless, mainly because when people came to us with an idea, it was often the quickest way to go from idea conception to actually launching a product. So I could really focus on writing the code rather than the underlying infrastructure. Last year, I also became an AWS community builder, which many of you may have heard about, but it's a program for people who are really enthusiastic about AWS, and it provides mentorship and networking opportunities. And it's not just to do with service. It also includes data science and different topics as well. So I would encourage you to have a look at that at Theodore. We also head up the Service Transformation blog newsletter podcast and the Meet Up, which are featured in a couple of times. We really have, like, the most up to date resources there. So do check it out. I think we've all been intro to serverless quite a lot here, so I'll whip through this quite quickly. But what does it mean to me? So to me, it means that we launch quicker, we pay less and we scale instantly. So as I said, a lot of people come to us and they have an idea and they might have limited budget and limited time to get something to market quickly. I really think that serverless allows us to do that. So I can see that the gift isn't actually moving. It says, explain this to me like I'm five, but yeah, how do we achieve a service architecture? So of course, we leverage the cloud. So AWS Azure GCP I tend to focus on AWS mainly because it's kind of what I know. And also it holds the largest market share. And just so we're all kind of on the same page. I know we've touched on these services, but throughout this talk, I'll be touching on S three for our storage, step functions for workflows, DynamoDB for our database, Lambda, for compute, SQS for our messaging. And then we'll be going right into start with the problem. We all kind of know, like the famous monolith and so kind of traditional architecture might be that we start with this beautiful code base, and then the business throws extra requirements at us, and the engineering team gets stressed, because if we add this one thing here. Then it might bring this thing down here, or a new developer comes in and thinks we can just add this feature, but not realizing that somewhere else there's a dependency on it, and we end up with bugs and everyone becomes very stressed and the business wants things quickly, and the engineering team builds up tech debt. But at Theo, we like to use event driven microservices, which I know we've kind of touched on as well here in order to kind of avoid this problem. So that when the business comes to us and asks for more and more things, the engineering team is actually quite happy to do this. And what this means is that when they come to us with an extra requirement, either it will easily be added into one of our microservices that we already have, or we'll be able to spin up a new service in no time. And everyone's happy just to dig into that a little bit further to dig into exactly what microservices mean to us. So each one kind of runs as its own app. So we'll use a workshop called Domain Driven Design. We have lots of resources on the Service Transformation blog, so do check it out later. But using that workshop, we actually define domains that can be independent from each other, and we use the term loosely coupled a lot, but effectively, each application within the system doesn't actually need to know about each other. So if we take the classic ecommerce example, we might have an order microservice, and then we might have a payment microservice, and each one doesn't need to know about each other. And also, if you end up with a really large application, you can actually have different, small, really effective teams working on each microservice, and each one will have its own data source, and it will deploy independently. And then the key thing is that if one of the services does fail, the rest of the system hopefully doesn't come crashing down if you did do it correctly. So there's many advantages to work in this way. And a key part of it is scalability. And at the heart of all of our service applications. At Theodo, we tend to use Amazon EventBridge as our event bus. So just to quickly kind of talk us through this in this diagram, you can see that the event producer in the Ecommerce example might be the order service, which will produce events and put them onto the event bus. And so the event might say order creative, and it might have some metadata within it. And when it arrives on the event bus according to a set of rules that will send that event to one of the consumers, which would be another microservice, such as the payment service to then pick up on that information and use that information. However it needs to. The key thing here is that the consumers don't need to know about the producer and vice versa. And so that goes along with the deploying independently and failing independently and therefore being able to scale. So I guess we kind of put a bit of context to a client that I worked on. It was a video conferencing client that I worked on for about six months, and this was at the beginning of the pandemic last March, and as you're probably aware, it was video conferencing. So it kind of went from just having a steady flow of people to just going boom. And they had on Prem servers. And essentially the website couldn't take the load, and they weren't able to take advantage of the market at the time. So we were kind of brought in as service experts to try and help with this and try to take advantage of the market. And so the challenge that we were faced with in March of 2020 was to scale to 250,000 users to train up existing developers on the project who are allskills in net and to release quickly to take advantage of the market at the time. And EventBridge was the key to success. So to briefly touch on this before we go into how we actually tested our EventBridge. We had the original system where the code base was over ten years old, so we obviously were not going to do a rewrite of that in a matter of months. So we need to come up with a clever way to make this work and scale. So we did a migration and EventBridge was at the center of this. In the new system, we had the event bridge. And from the old system, we were able to pass events into the new system, such as a user signing up with their metadata to be used by the new microservices that we were spinning up. And so in this example, we had a microservice that was called chat. So a chat part of the app. We built that up really quickly. That scaled really well. And we were able to then replace the old chat so that it was scalable. And that was kind of how we ended up working towards this new scalable system. So to get into one of the main part of the talk about how we actually test these serverless architectures, you also can't see the gift on here, but it kind of shows how you can have all your unit tests working really well. But then when it actually comes to the integration test, you realize that when things are actually put together, they might not work so well. Who traditionally, I think most of us, I hope, know how to write a unit test. Personally, we tend to write our applications in no Jess, and we use a test runner called Jess. And so that is what we're going to be talking about in these slides. However, it can all be translated into whatever language. It's just a testing strategy that can be implemented in a different way. So before we dig into how we did it, there's kind of a question of do we test on the real infrastructure or do we test using mocks and stubs? So we tend to test on the real infrastructure. So I guess it's kind of like, why not? We want to test on something that's exactly the same as production. And so we tend to test on the real infrastructure when using mocks and slubs. We might do this for, like, a failed test case, something that's not normal. So we would need to mock that out because we can't rely on that just happening. There are also assumptions that we might make when using mocks and stubs that could actually end up causing our tests to pass when actually they shouldn't. So we test on the real infrastructure. And how do we do this? Well, we actually put this into our CI process. So when we open a pull request, we'll spin up a test stack and we do this using cloud formation and service framework. But again, as we've talked about before, you can use Sam and CDK whatever suits you best. And we'll give it a unique name. So CI 27 is the example here. We tend to name this after the ticket number that we're working on, so that it's unique because all of your clinician stacks, they need to have a unique name so that you don't override the top of someone else's. And then once it's been deployed, and at this point, we effectively have a copy of the production site ready to go. All of the resources spun off in AWS, we'll run the unit tests, followed by the integration tests and followed by the end to end tests. And at the end of this, we'll then tear down the stack in cloud formations. This is then deleting all of the resources that we came up with. And the reason why we do this is Firstly, I think there's a maximum of 500 stacks that you can have opening cloud formation. Secondly, you're just paying for resources if you don't tear it down. So of course, we tear it down at the end of the CI process, and therefore it's really cheap to do this as well. So to actually touch on the testing strategy, now we'll take a really simple example. So we've got our ecommerce example. Again, we've got two microservices. We've got the order service, which has a Lambda, which does some processing, which then emits an event onto the event bridge, the event bus, which might be order created as we've discussed. And then due to the rule that we've defined in EventBridge, then the second microservice picks up on this, does some processing within a Lambda and puts an invoice into an S three. But of course, this can be applied to any use of EventBridge. This is just a simple example that we're taking here. And again, just to be clear, we do have an article that outlines all of this, and I'm going to be showing a few code examples as well, but all of this is in our articles online. So yeah, no need to be like taking notes. So Firstly, I'm actually going to go back. So just to outline here again, the end to end test would be that when the Lambo's triggered that the invoice arrives in S three, which is a good test. However, if this happens to fail, you wouldn't know that whether the problem was in the microservice one or the second microservice. So we want to split this into two integration tests and perform these separately. And again, because each microservice is effectively in its own code base, you need to put each of the tests into their respective code bases, so they're not kind of running cross code base. It's one test in one service and one test in another service. So Firstly, we want to assert that the event was fired. So we want to trigger our Lambda, which does some processing within it, and then it emits the event onto the event bus. Now to assert that the event was fired, we actually introduced something called an event interceptor. So this is something that isn't part of the actual infrastructure. This is something that we add in for the test because you can't simply just check that the event arrives on the event bus using our test runner. So we use SQS in this case to actually pull on the event bus. Sorry. So we actually add a rule to the event bus that sends the event to the SQS queue. And then using the test runner, you can pull on the SQS queue to assert that the event arrived. So if the event arrived in the messaging queue, then of course it arrives on the event bus, too. So that's how we did that. We kind of had to be clever in doing this, because at the time we kind of written our code, and then we were like, oh, we need to test this, and we wanted to do it on the real infrastructure, not using mocks. And so this is how we came up with it. And it's actually a really nice way of handling this case, and we open sourced it. So this is great because I'll go through code examples of what actually is going on behind the scenes. But what you'll actually see is that the client code that you'll be able to write is actually really short because we created a library for people to add this to their projects. So the first integration test, this is written in Jest, but I'll talk us through it very quickly. Again, we're asserting that the event was fired and that it arrives on the event bridge. So Firstly, we're invoking the Lambda using their AWS SDK. So the Lambda from the first service, and then we are waiting for that to come back and calling this Get Events feature and then we're checking that these events actually have the source that we expect all the creators. I have this mean because it's a bit like what the hell is going on? Because we have this get events and have event with source. Now this is all coming from SLS test tools. So we'll dig into that now. But I guess why I wanted to show this is that the test is actually really simple to write from a client perspective. So Firstly, before we actually write this event, we do have this before all. We're essentially building the infrastructure beforehand and this is all happening in test tools. So before each test runs, we make sure that we have all the infrastructure in place, which basically means the SQS queue on the EventBridge. And so you can see in purple is multi client code, which is kind of the part that everyone will be writing. And then SLS Tools is just to talk us behind what's going on behind the scenes. So Firstly, behind the scenes we're setting up the SQS client, which of course isn't part of the infrastructure that you already have in place to do this. We use the AWS SDK, and we basically just give it a unique name and spin that up, and then we add the event bridge rule to it. So as we said, between the event boss and the consumer, which is our second microservice, or in this case our Sqsq, we have the rule to make sure it actually sends it there. And so that's setting up the event bridge rule using the put rule from the AWS SDK, and then we make the SQS queue the target of the event bridge rule. So you can see that simply we've just added a target using the SQS on, which is just a unique identifier for the SQS queue. Then we give it a target ID, which in this case is probably just one, but each one needs to have its own separate number. And so now that we've actually set up the infrastructure in the build process, what we want to do is call the get events method, which Firstly, although there's quite a lot of the code here, I'll quickly just break it down. So we use the receive message on the SQS client using the AWS SDK to get the messages. And we define the wait time as a certain number of seconds and anything above zero means that you're long polling because everything is asynchronous. So we need to give it a bit of time to long poll on the SQS queue to actually get the message back. If it was 0. Second, we would call it short polling, and it wouldn't be asynchronous and then we return the results. It's just the event that arrived on the event bus, and subsequently the message queue and everything in between is just deleting the message of the queue just to clean up after ourselves. And just lastly, SLS Tools is actually comprised of a lot of assertions to help us out. So these are just extenders. So you can see that on the bottom line of the client code. We're expecting that the messages that have come back actually has an event with the source that we expect order created. And simply this is just the just extender. So we pass those two items into it and we return a pass or a fail based on whether there is an event there with that source or not. So that's the first integration test done, and you'll be glad to know the second one is actually a lot simpler, and we'll be going on for quite as long as that one. So for the second one, we want to assert that the event was actually received from the event bus, so that the event actually pushed the event to the correct microservice, and that in this case our invoice ended up in S three. And so how we do this? We inject the event onto the event box using one of our SLS tools helpers, and then we just check using Jess that the event actually arrived. Sorry that the file actually arrived in S three. And so this is the client code. So we publish the event, which is just injecting the event onto the event bus, and we just give it the same event that we had in the first microservice. Of course, we can't actually use the real event that happened in the first microservice because all this code is stored in the second microservice. And then we wait on that to return to see whether the S three bucket actually has the object that it received with the correct file name. And again, it's another assertion. So to have three objects with name equal to we just passing these parameters and we use a jestex sender to return a pass or fail. Of course, I'll be showing these slides with everyone afterwards, so that's base integration test complete. And that is how we test EventBridge and in SLS test tools, we have the range of assertions and helpers to really help us with our development. Many of our clients do actually use it and find it really helpful. I think one guy said that it saved him about a week of work by just writing these tests. So if you are using a Ben bridge, which I hope that you are in your service architectures, then this could be really helpful. And we also welcome contributions because it's quite a young project at the moment. I'd love to have assertions for, like, Cognito Dynamo and basically the whole of serverless, which would be great and you'll be added to the contributors list. So it'd be really cool if you're or got involved. And yeah, I just like to say thank you for listening. And if there's any questions, then to ask.