Video details

Advanced Serverless Messaging Patterns for your Applications - Julian Wood

Serverless
11.06.2021
English

ServerlessDays Paris - https://paris.serverlessdays.io/
Using the right messaging patterns between your services can help with durability, availability, and reducing custom code. This talk shows how to use AWS services to build robust messaging patterns into the foundation of your application architecture. Introduce fan-out, queuing, and decoupling between your services and combine multiple services to create powerful architectures that make your workloads scale.
About Julian
I help developers and builders learn about and love how serverless technologies can transform the way they build and run applications. I was an infrastructure architect and manager in global enterprises and start-ups for more than 25 years before going all in on serverless at AWS.

Transcript

Going to be easier for all of us in English. Otherwise it will be a bit of a mess. This presentation has got lots of information in it. Anyone who's worked with AWS knows we've got lots of services and love them or hate them. There's lots of choice. And that's what's part of the power of building with AWS is. You've got a lot of different cool things to choose from. So the slides are already available up there. You don't need to take notes or photos if you don't want to of the slides because there's a lot of information I'm going to go through and it's all going to be available afterwards. So when we talk about serverless and we talk about Lambda and we talk about functions and events, how does it all sort of fit together? We like to think of as Lambda as the small little part of the function, and it forms part of the bigger picture of functions as a serverless, which forms part of the even bigger picture, which is event driven compute. And when Lambda was announced, it was never announced as a serverless thing. In fact, there wasn't any such thing as serverless. When Lambda was announced, it was announced as something with eventdriven computing. An event happens and some cool compute can do some stuff. And that's where Lambda was born. Serverless idea came after that. And we're sort of tacking that on. So that's just to give you a round about how it all sort of fits. But I'm talking about messaging today because a Lambda function is cool, but one Lambda function in isolation. It's not going to do much service applications, and the applications that you are building are going to be joined up of multiple microservices. And the benefit is, as it says here looser, they coupled. The bigger individual services can scale. They'll be fault tolerance individually. They'll have less dependencies. And that means that the faster that you will be able to innovate and build your applications. And in fact, Tim Bray, one of the cleverest people on the planet who used to work in AWS, says that if your application is cloud, native or large scale or distributed and doesn't include a messaging component, then you probably have a bug. So think of that when you're building your service applications that you messaging is going to be a critical part of it. If you look at the Amazon dot com web page, for example, I should have done Amazon Fr web page. I didn't think of that. But anyway, sorry about that. When you place an order on this web page, a lot of things are going to be synchronous. You're going to be viewing the web page, you're going to be able to click and look at kind of things. But once you place an order, there's a whole lot of background processing that's going to happen. There's going to be some payment that's going to be happening. There's going to be shipping. There's going to be logistics. There's going to be a van that's going to arrive at your front door, and all of that is asynchronous all of that is behind the scenes. That's got nothing to do with how you are going to be interacting with the web page. And I'll explain how that's going to be important, because a lot of people think that if you're going to have microservices and you're going to have different components connecting together, should every microservice connect to every other microservice via an API? Well, maybe not if you think of your application as having two different paths, a public interface like the Amazon. Com or Amazon Fr web page where your customers or your business partners or people who interact with your website are going to communicate with you. Yes, that's probably a good place to have an API, but behind the scenes, all applications have a whole bunch of different things that they need to communicate and need to do, like the shipping logistics or the payment logistics that happen behind the Amazon site. And I'm sure the applications that you do too. So what you can think of is the internal services focusing below the waterline. This is a bit of an analogy of looking at an iceberg. If you know, an iceberg only has a little bit at the top and a whole bunch under the water below the water line in your iceberg. You can think of that's all your internal services and you don't necessarily have to have APIs. There's some other options that you can use, and messaging can provide more resilience availability and scalability for your application. If you're wondering why. I mean something like scalability. Well, the individual services can scale in different ways, and that can give you cool superpowers that you'll be able to have a more resilient and more available application because each individual microservice is responsible for one or maybe two things and they can be scalable individually. And your bigger application can be more resilient, because if there's a problem with one part of your application, it's not going to affect the other part. So let's have a look at some microservice messaging patterns that you may want to use. So two common patterns that you can use is something called a queue and something called a topic and a queue is point to point and a topic is publish subscribe. So in this example over here, you can see we've got a sender component, maybe one component I've shown here on the slide, but could be a whole bunch of components that are then sending messages onto a queue. And you can see here messages A, B and C are then sent out onto the queue, and they are receivers who pull these messages of the queue. Now, with the queue, each receiver only pulls one message off the queue. One receiver, there's a single relationship between your receivers and the messages. So one message is received by each receiver. When you talk about topics that work slightly differently, that here you've got messages A, B and C, and it uses a fan out and you've got multiple subscribers to the topic. And so you can see here messages A, B and C are sent to all three of the subscribers. So that's topics and queues two different ways. You can think about how to distribute your messages between components. Now within AWS mentioned earlier before we've got SQS, Amazon SQS, which is a simple queue service, and Amazon SNS, which is simple notification service. Both of these services are cloud native. They are serverless, there's no infrastructure to manage. There's nothing to patch. All the scaling is built into the service and it's just available and can really go up to amazing scale. Now, if we're talking about Lambda as we were in the beginning, Lambda can be a sender onto a queue or a publisher onto a topic. Lambda can also be on the receiver side, so you can have a Lambda function that pulls information off a queue or a Lambda function that can be a subscriber to an SMS topic, and it can get information off of that. So Lambda can be a consumer. And in fact, Lambda could be both. So you could have Lambda functions that are putting messages onto a queue into a topic, and you can also have Lambda pulling them off. Another way of thinking about publish subscribe is a thing called an event bus, an event bus. Here you can see we've got messages A-B-C, and D, and the event bus acts as a sort of bit of a traffic router, and you can build up rules and filter rules to say where should messages go? In this example, you can see messages A and B are going to go to one location and messages C is going to go to another location, so it's not quite a queue where the messages are distributed. It's not quite a topic where messages are broadcast to everyone. It's a bit of a mix and most a bit of a mix and match where you can have multiple publishers and multiple targets. And it's a really efficient way. Amazon AWS has a product called Amazon EventBridge, which has been spoken about earlier today, and that is also again Cloud native. It's serverless. And the cool thing about EventBridge is that you can grab events from AWS services. You can grab vents from your custom applications, and we've got some great SAS integrations with partners like Datadog who are presenting out here today and a number of other SAS partners that events can flow directly into your AWS account without you having to set up any infrastructure to manage. And then you can create the rules and send those events onto any other services. Those services can be things like step functions, which is workflow automation. Aws Lambda obviously Kinesis Data firehose, or in fact, any API destination on the Internet an event bridge. As our next Speaker, Cyril is going to be talking about is a really powerful construct to think of event driven applications. The name is EventBridge, but it's a really cool way to have a lot of flexibility for events to flow around and between your application. So just to summarize, we've got SQS, SNS, event Bridge and Kinesis I briefly spoke on and these are some of the messaging services Kinesis I haven't brought up yet, and Kinesis is basically for streams. So the difference between streams and events is streams is normally a lot of data and streams. What is important is the correlation between the different data points is important. So something like click stream data or something like log data where the time order is important, the way the different points interact with each other is important rather than a queue, an event or a notification where the ordering and the correlation between the individual events don't matter as much. So Lambda fits into this in a number of different ways. There are three different ways that a Lambda function can run. One is a synchronous push system. So if you think behind a normal web page, if you've written any web pages and you may have an API in front of it. Api gateway, which Theodore was talking about earlier that will invoke a Lambda function and that is synchronous going to go to Lambda, get the response, come back and you have your response back from your caller. The Asynchronous one is very important for a lot of the events we're talking about over here. And this is when something like SNS or something like S three, the object storage service can invoke a Lambda function asynchronously Lambda can go off and do its processing in the background. And as soon as the message has arrived at Lambda, Lambda tells SMS or Lambda tells S three or a bunch of other services. I've got your message. You can carry on responding and I'll go and do the work in the background. It's a very powerful and very important way to think about Asynchronous messaging. And the third one, just the way that Lambda works is a pole based mechanism, and it's basically going to in effect, reach into DynamoDB. It's going to reach into Kinesis. It's also going to reach into SQS, for example, and it's going to grab records out of those services and then perform some computation on it and maybe send the data back to some other kind of service any time that Lambda is invoked by each of these services, or even via the console. It uses the Lambda API, and this supports the synchronous and Asynchronous models, and it's just basically a blob of JSON that gets passed between the various services and the client is included in every SDK. So however way you happen to use Lambda, it's always using the Lambda API to communicate and get Lambda functions to run. So lots of different services. How on Earth would I choose between them? And the short answer is it's not going to be that easy to choose between them, but there's some good information out there, and the reason it's not easy is not because they're necessarily complex. It's because applications are different and applications have different needs, and each of these different services have their own superpowers that you should be considering when building your application. I wish there was one messaging service to rule them all, rule them all. But unfortunately, in the past 30, 40 years, computer science hasn't come up with one single messaging solution, so all of these have different ways. And to think of them, I like to think of them to compare it along these sort of six different axes. Scale concurrency controls, durability, persistence, the way it's consumed failures and retries, and also pricing. And there may be other things that you also want to consider when you want to choose between your messaging services, maybe something like complexity that could be something that you maybe want to think about as well. So this is where the detail gets heavy on the slide. Don't worry, I'm not going to go through it all. I'm going to do a quick kind of summary, but these are the different sort of scaling and concurrency controls between the different services and the clicker went. And I was panicking that I'm going to have to read through the slide, which is going to go against everything I said about not getting into too much detail. But the cool thing to think about is Lambda's got a really cool way that you can control how many Lambda functions run concurrently now. Concurrently basically means at the same time. So if you've got an application that needs to scale up and you need 1000 concurrent requests, Lambda is going to split up 1000 what we call execution environments that's going to run that Lambda function, run that little piece of code. Now, as soon as the function is finished, it's available for reuse. So if another invocation comes in for Lambda and their function has just finished, it's available for reuse, what you can do is you can use this Lambda concurrency to control how many functions are running at the same time, which could be useful. Sqs has just got something extra with SQS. We can actually batch information. So in a queue you can put a whole bunch of messages, and Lambda can pull out those messages in a batch just slightly differently for handling the scaling. And then Kinesis has got this concept of shards, which is literally how many parallel processing units you can have, and you can scale up Kinesis to a crazy amount of shards. And that's how you're going to have even more throughput for your Kinesis streams. So the two controls for managing Lambda are these two things. One is called reserve concurrency, and one is called provision concurrency. So reserve concurrency is basically saying what's the maximum number of Lambda functions that I want to run at any one time? Why would you want to do that? Well, two reasons. Mainly, you may want to guarantee that you have enough Lambda functions available to run because within AWS, we do have some limits to protect the service and also to protect your applications and things don't go crazy and you end up having a huge kind of bill. There are some limits built into the system to say that you can have a certain number of Lambda functions that are running at the same time. And so if you've got lots of Lambda functions in your account, you don't want them grabbing out of the big total concurrency pool. So what you can do is you can reserve concurrency and say, I want 1000 invocations for that Lambda function and you know that there'll always be 1000 invocations concurrently available for that function. The cool thing one thing to remember that people may not know is if you set the concurrency to zero, it's basically a stop switch. So that stops all indications for that Lambda function, and you may be able to use that if you've got a downstream service that is broken or you need to do some maintenance work or something and you don't quite have control of the events that are coming in. You set your concurrency to zero and Lambda just won't invoke the function anymore. So useful little trick to have in your back pocket if you need it and provision concurrency, which Daniel mentioned that people are overpaying for in the first talk from Data Dog. But provisional currency is all about being able to precreate Lambda environments because it does take a few minutes sometimes to scale up the number of Lambda functions that can run at some time. And so if you do need a huge amount of Lambda functions exactly available at 09:00 in the morning because you have a competition on your website or a sports streaming company is going to be doing some sort of sports stream and you know, at X amount of time something is going to happen. You can basically get Lambda ready and waiting to be able to process all the requests that you're going to need. So the concurrency and models. Another way to think about it is with SNS EventBridge and API. There's no sort of storage of the events. It's just passing through the system, and Lambda is going to do an invocation for each of the events and pass it on to the next kind of system for SQS. When you've got a queue, there is a concept of storing information in SQS and you can see there you've sort of got a batch of messages that the Lambda is then going to invoke in stream base with Kinesis. When I talked about the parallelization, that's how this works. You have a stream and a Lambda function gets attached to a stream and it can pull records off that stream and the more streams you have. Sorry, the more shards within a stream, the more parallel processing you can have with Lambda. So it's just different ways that the services integrate to deal with concurrency. So Durability is thinking about how messages are stored while they are in flight. So what happens to messages as they are going through as they're going through the system? Well, basically, Lambda doesn't have any durability itself because it's a very sort of short run connectivity between these different kinds of services that we don't need to persist. The data in Lambda, but for the other services, they all run across multiple availability zones. So you can be really confident that if you're going to give any message to SNS, EventBridge, SQS, or Kinesis, we have an issue in one of our availability zones. This does happen sometime. Your message is going to be pretty much guaranteed that it's not going to be lost. So persistence is something slightly different from Durability, and persistence is how long are we going to keep the messages around? And there may be some useful ways that you can use to keep those messages in the service for a little bit of time. Lambda SNS an event bridge don't have any persistence. So an event comes into one of those services that gets processed as quickly as it can and gets sent off to another service. But SQS and Kinesis have this concept of that. You can actually store data in the service, so SQS can act as a bit of a cue, a bit of a buffer. So what you can do is you can put messages in SQS and they're going to stay there until you are ready to take them out. And that can be useful for a number of different scenarios. One can be if you've got a downstream service that maybe it's a relational database or it's an API on premises, or maybe an API on the Internet that's got a throttle, and you know, you can't use X amount of you can't overwhelm that API. This is a good way you could use SQS to store those messages, keep them all ready for you, and then have Lambda invoke using the concurrency controls to protect that downstream resource. Super useful. Another way SQS can be useful is if you then need to do some work on a downstream service. So you want to take a third party API is going to have some maintenance work. You don't lose the messages or have timeouts or have things that are not able to connect. What you can do is you can durably persist the information in SQS and then at your own time when the services come back up, the messages are still going to be there and you can pull things off. Sqs really helpful just to carry on where you left off. And Kinesis has got also a similar concept where you can configure how long you want to keep the messages in a Kinesis stream for so the consumption model. We spoke about the sync async and poll a lot of different information in the slide and some guidance on how to use these kind of services. For example, things like SNS, you can have message filtering. So what you can actually do is you can look in the part of the message attributes and say, Well, I want one message to go there. I want another message to go there. Eventbridge uses rules. The event bridge rules can also look at the message body that's something different and for consuming Lambda. One tip to use is always look at step functions because step functions is a bigger sort of more orchestration level and workflows. Rather than building a whole lot of complicated logic within your Lambda function, there's some payload size limits you may need to consider. So this is just one of the other things you need to think of between using these different services. Retry failure error Handling All the different services have different ways that they can retry. If there's a problem with the service you can see, SNS is up to 23 days. Vintage is up to 24 hours SQS. We talked about the queue. The messages stay in the queue, so retrying the failure handling is to think of some different ways that maybe your application needs and how you're going to architect your application with these different services. Now things do go wrong within a Lambda function. An invocation may fail that may it may be a problem with your code not saying your code any code that I write, I need to be extra super careful that it's not going to make any mistakes. That's just because I'm a rustic coder. Sure, your coding is all perfect and brilliant and you don't have errors, but sometimes there are errors in your code or errors in your Lambda function, and you need to be able to handle that. So land automatically. Being a serverless service does retry attempts so you can configure the retry between zero and two. Maybe you want to set it to zero. If you never want to retry a failed invocation, there could be a reason for that. That's something that you can configure. There's also a maximum age of Lambda events, so if something isn't available, you can configure how long that invocation window can be, and once that passes, then that's going to be. Then a failure of your function invocation what you can then do is if you've heard of Dead letter queues that we spoke about earlier, dead letter queues is one way. If something fails, you basically send the message onto an SQS queue. But a more advanced way with Lambda is a thing called Lambda destinations, and Lambda destinations allows you to send a message to EventBridge, SNS, SQS or Lambda if something is successful. And also if something is not successful and it's got a bit more information about what failed to just give you a bit more information. So if you are using Lambda and failure modes with dead letter cues, DL cues, it's definitely worth taking a look at Lambda destinations. It just gives you a bit more functionality. Event Bridge So Lambda has destinations, but EventBridge and a number of other AW services do have the dead letter queues. It has been covered earlier, so I'm not going to go through it now, but this is just a way that if something fails, you can keep that error message and rooted somewhere else, which you can investigate a really powerful pattern, lots of different pricing things to think of. But the really cool thing to know is that Lambda SNS of Enbridge and SQS your first million requests are free per month and that's every month. So it's not just if you're within the free tier, you get a million of requests for each of these services in perpetuity for every single month. And there are a lot of AWS customers that literally are within the free tier for their whole business. And yes, literally their lander costs are basically absolutely zero, so they may be paying for other services, but this is a really good way to be able to try something because the cost can be really cheap when you get in going and even up to moderate scale. Quick thing. People often talk about the difference between EventBridge and SNS. Basically EventBridge has got a whole lot of integrations for third parties and can route to more destinations. But SNS is really good for fan out. So if you've got lots of subscribers you need to send to the millions. Even SNS is really good for you. The filtering I mentioned briefly, SMS filters on just the message attributes. Eventbridge can filter on the whole message body. Sns is a bit faster. So if latency is super important for you, then latency is going to be a factor in your decision tree and there is a pricing thing. Eventbridge is going to be a bit more expensive than SNS. So just to summarize if you've used on the comparison, if you've used cloud watch Events before Event Bridges replace that. So Event Bridges is really good service built on the history of cloud watch events and has a lot more functionality. Sms for high throughputs Kinesis. Rather use that for real time processing at huge scale and SQS if you need resiliency and we haven't covered it today. But if you need some sort of ordering which is in the FIFO, which is first in first out and you want to buffer messages, SQS is going to be preferable to EventBridge. So part of these messaging platforms messaging products is that you can all use them individually. But the real power comes is actually when you're able to combine them, I check on my timing before I'm going on. I need to be quick. There's a concept of Topic Q chaining and this is what you can do is when you can have an SMS topic which can fan out, and then what it can do is it can actually put the messages onto a queue and then you've got some application that can pull messages off that queue and process them. The benefit of this is you've got the fan out. You can got messages going to multiple destinations, maybe two different microservices, but they get persisted in the queue and they store there for as long as you need them until an application can pull them out. Scatter gather is a similar kind of thing where you put something on SMS, fan it out. Maybe you want to get a request for quotes from a whole bunch of different places, or you want to find out which restaurants are open or some kind of thing where you need to broadcast something out and you want all the responses to then come back in where you've then got an aggregator component that can then get all the responses really powerful to use SMS and SQS together. Eventbridge can also integrate super well with SNS. Remember, EventBridge can look at the message body. So maybe you can have some really powerful rules that can look at the message body and then send that information onto SNS, which can then be broadcast out to billions of subscribers, including Lambda. Cool thing is, you can also do these things across accounts. So something like SMS, you could have something central in your vendor's and then send the SMS message out to maybe a million subscribers at all over the world in their own AWS accounts. A really powerful way to do that. Eventbridge works really well as well because you can then persist information into the queue and then you've got the messages that can be routed with EventBridge so they can go into different queues. There may be different microservices, and then you can use Lambda to pull the information off. And these could be different microservices and the microservices B you can see here we could then use the Lambda concurrency controls to limit to maybe only 50 concurrent connections to the Amazon RDS database so that we don't overwhelm our downstream system. So really cool ways that you can combine event bridge in this case and SQS pipes and filters is just a way that you've got a whole bunch of different services connecting to each other. The problem is it creates these long chains and that's really complicated. So what we prefer to suggest is a thing called Saga orchestration, where you have a source and a target and you have some orchestration component in the middle that handles all the orchestration, so you don't have to manage what's going on next in the chain. And this uses event triggers, and maybe you've got transactions and rollback and branching and parallel workflows and retries and all these different kinds of things that you want to do in an orchestration. Something goes wrong with creating a user, for example, and you need to undo a bunch of different systems or a flight gets cancelled and you need to refund a card and then you need to release the seat. All this kind of orchestration of being able to roll forward and roll back. Workflows is what AWS Step functions is really good for. So step functions and Lambda works super well, step functions is a workflow orchestrator and that can connect to APIs, can connect to Lambda, can connect to a whole bunch of services natively so quickly. To sum up, I did promise you there was going to be a lot of information if I was to cut it down, I'm sure I would have lost you even more. But the thing about it is how real time do you need your info? Do you need synchronous or do you need asynchronous do you need to create an API or can you use messaging between your different applications? What's going to break if Lambda fails? How are you going to protect your downstream resources? All of these services are serverless no infrastructure to manage pay as you go. They scale down to zero and scale up to a ridiculous amount of processing. All support. I am. So they've got lots of fine grain permissions to make this really secure. So the takeaway is that we've got these great messaging services. You can use them in isolation, but they've also got superpowers that you can really use altogether to be able to combine different things. If you're going to be looking at workflows, step function is going to be your friends. As I said, with several pricing, it's really cheap to get started and working it all out. So what we do have is we have an aggregation site called Serverless Land, which is literally everything to do with serverless on AWS. Well, a lot about serverless on AWS, their videos and learning paths and blogs and a whole bunch of information. But what is cool is we do have a thing on there called the Serverless Patterns collection. So if you're using AWS, Sam or the CDK, the Cloud development kit, we have a collection of patterns that you can connect SMS and SQS or Lambda and DynamoDB or API Gateway and Lambda or whatever all these different kinds of services. And this is just a really good, powerful pattern collection from datadogs thing over there. And I know there are people from the serverless framework over there. If Sam isn't your thing and you want to use the service of the framework, please go ahead. We'd recommend whatever you're going to do is use a framework. It's really going to make a surplus development really good if the service framework appeals to you. Fantastic, we can absolutely love that. It's going to make your life building service applications easier. So there we go. That's the link to the quick run through of the presentation. You can go into more detail on the slides and look at those six different ways that I was going to compare. And, yes. Thank you very much. It was a pleasure and an honor to come to Paris today to spend time with you. I'm available now for questions afterwards as well. So, yeah. Thank you very much.