Video details

Sheen Brisals - Serverless Patterns Made Simple with Real World Usecases - NDC London 2022

Serverless
07.21.2022
English

Patterns are common in all walks of life, so too in Serverless computing. Software architectural patterns are often viewed as complex constructs that are beyond the grasp of many engineers. When mixed with cloud computing and serverless, the perception makes it harder. Textbook-style narratives of serverless patterns often fail to connect with engineers. Sharing a true serverless journey with the community brings authenticity and acts as a great learning platform.
In this talk, I will touch upon the experience of working with several serverless applications. Every will be associated with a use case that can be easily related to. What better way to equip serverless engineers to be efficient and innovative than sharing the knowledge!
Check out more of our featured speakers and talks at https://www.ndcconferences.com https://ndclondon.com/

Transcript

Warm and sunny. Welcome to this session. So my name is Ashamed of and I work for the Lego Group. I am kind of active in the Serverless and AWS committee have the privilege to be known as AWS Serverless Hero as well and I often write blog posts about serverless experience and you can also follow me on Twitter as well. Right. So today's agenda so I will give a glimpse of what we are doing with Serverless at the Logo Group and then we'll focus on a couple of slides on just to set the tone what is and why we need serverless and then we start looking at patterns and some of the serverless patterns that we use as part of the work that we're doing at the Lego Group and close with what are the benefits and few resources in between. I will also kind of touch upon how the sustainability becomes important and part of the whole serverless technology as well. Okay, so what are we doing at the Logo Group with Serverless now? This is Logo.com now, we moved to Serverless back in 2019 and often people ask okay, that's fine, but what exactly are you doing with [email protected]? And if you visit LEGO.com, say you purchase something, you pick an item, drop into the basket, you move on to make your payment and then you complete the order all sorts of validation and everything behind the scenes, all the micro services that support those activities for the front end or 100% serverless. So that's sort of the serverless around LEGO.com that we are doing. So this was migrated from the old legacy platform back in 2018 and 2019 when we completed but now the back end of LEGO.com runs fully serverless the front end which the typical react JavaScript node JS and also we have the middle layer or the back and forth front and graphker layer so they are a non forget but if you look at all the services behind the scenes they're all serverless. So that's where kind of the serverless movement started as the Lego Group. So when I say Lego Group I should make it clear that this work becomes part of a bigger department or team called Marketing Channels Technology which comes under Digital Technology Organization. Whereas if you look at the different teams across the Lego group there are a bunch of different technology. We still do the on prem stuff, we have big SAP establishment systems in place and teams are also working on containers and Kubernetes and clusters and also the typical monolithic systems also there in typical enterprise you can think of everything in place. Okay, so when we started this was the first service that we put out to production when we moved towards serverless. So based on the experience and the success that we saw we kind of gradually expanded across to all the different services behind LEGO.com. So going back to fees so that time we had no serverless and few engineers and then by the time we deployed, we moved to my serverless. We had like 20 engineers but typically the typical back end, front end kind of team set up. So if I fast forward to now ish so there are more than 20 product codes now. So product squads, I mean typically eight to ten people forming a squad and they own a particular product. For example, if I say check out, there is a squad, they look after the checkout services as well as the front end, similar to payments, loyalty, et cetera. And then number of services lost count because we operate now with different cloud accounts. So like I mentioned, different Scots nowadays they own their own account and they operate with their own repositories and stuff like that. So basically over 100 microservices and plenty of lambda functions and things. Okay. And these are the spikes and graphs these days are very common. Even a few days ago may product launch we had similar spikes and when it comes to serverless, we kind of don't need to do anything. Sometimes depending on the occasion, we may provision concurrency set on few lambda functions that are hit hard, but otherwise just nothing, it just scales up, comes down as you would expect. Okay, so what is serverless? How many of you are kind of familiar or working with Serverless? Okay, quite a lot. Thanks. So before I get to Server, let's quickly go through the evolution. I'm not going to go all the way back. You must have heard from the keynote earlier the evolution of Microsoft technology and things around. So it's all kind of started with a hosted data center and we had infrastructure as service, then came the platform as a service and from there basically the whole evolution of the function as a service and that kind of brought the serverless to the forefront. So that's where the serverless kind of came into and became popular with different cloud providers. So what is it is cloud computing model? No manager. You don't need to kind of manage our containers or clusters or anything and you pay for the compute and also if you're storing any data you may pay for that and auto scale highly available, everything is fine. So the thing is I'm interested in talking about why we need Serverless because often people think that when we say serverless it's just the function as a service. No, it's not functional. Service plays a part, whereas Serverless ecosystem itself has a number of managed services. They come together to provide us the solution and the optimization grant. This is another thing I like because if you work with Serverless, you know that you can tune resources individually. If you have two functions in a service micro service, you can tune them separately, you can give them different security credential accessories. So that is the power of Serverless in my opinion. That goes all the way. So deeper level security data privacy. If you have two data tables, you can kind of set them up completely different. You don't need to kind of generalize. So that's sort of the flexibility. And the two other important aspects I like is the ideal technology for Iterative development. So you don't need to kind of build everything up front and bring it up. Just start with your POC or the MVP or the minimum viable stuff and then kind of it straight and get over. So this is kind of more common. They approach people or the teams follow these days. So it's ideal for that. And the other thing is the engineering diversity. What I mean here is that if you come from the traditional setup, you say that I'm a C plus plus or Java program or JavaScript and things like that. Whereas when it comes to serverless, you are more than a programmer. You need to know about security, you need to know about event driven computing, you need to know about messaging. All these aspects come together and that makes you as a complete serverless engineer, you're not just a programmer. So that's another beauty of the technology. I always trust this point out. Okay, so those are the things to do with serverless. I'm not going to go into details. So let's go on to patterns. So while I was preparing for this talk, I gave this talk a few years ago, this phrase came to me. This is absolutely true when we think of patterns, so you see patterns on top of patterns, you patterns within patterns, patterns by patterns, all sorts of things. So there is no kind of a clear cut definition to say that this is what the pattern is and this is not okay. Because if you think that way, we wouldn't have any of the other books other than this one gang of course pattern book. Why we have all the other books because there are different patterns, like J two E. If you've been through that Java ecosystem, you know that the Ejbs and those kind of things. And the micro services patterns book. So that talks about patterns related to microservices, like even Sourcing and things like that. So the Gang of her original book was kind of influenced by those days, the C or the object oriented program. It was true, it was good, but then things move on. We have enterprise integration patterns. So it talks about say, deadletter Queue DLQ as a pattern. But nowadays if you work in cloud and serverless, you would think that is that really a patent? Because that's kind of commonly used in your design or architecture. But these are all kind of come together. So that's why the previous phrase patterns within patterns and patterns on top of patterns, they all kind of make sense. So there are different ways of looking at patterns. They help to design things quickly or faster and associated with the problem. Sometimes like singleton or factories and things like that. And then if a solution is true, can configure. If you use the right pattern wrongly or if you don't know how to use a pattern, then it can kind of get into issues and very opinionage. Like I said earlier, a few years ago when I was giving a talk and I was going through all the cloud and serverless patterns and someone later commented that oh, you didn't talk anything about the classic pattern. I asked what do you mean by classic pattern? All the factories and the singletons and things like that. So people kind of think differently, different ways based on the background and experience. And often people say serverless is a pattern, I don't know. Okay, so if you look at the traditional design patterns, we can kind of classify them into three different groups. So we have creational patterns and we have structural patterns and then the behavioral patterns. So if you look at the Gang of Four patents book, if you kind of group them, we can see these sort of the different patterns that are in place. So for example, when we say behavioral patterns, the observer is one thing and structural pattern adapter and results and so many other things. So they all come together. But if you take a different view, say a high level view, then we can kind of classify into three different ways. So we have coding patterns, we have design or architectural patterns. And the modern day we have patterns associated with the cloud architecture. Because cloud has an influence, cloud has something to say how we kind of scale up service or do something else, load balancing. So those areas may have patterns. So same as serverless as it falls into cloud. So it has its own kind of way of defining certain solutions for problems. There's always overlap, right? So because when we have architectural patterns like CQRS, we need to do the implementation for that in order to make it work, right? So there is always sort of overlap. And when it comes to cloud, again, this sort of springler patterns, very common, especially we use as we migrated towards cloud. It's one of the common patterns many teams make use of when they migrate to microservices or cloud from the legacy platform. Orchestration choreography. This is like always the debate what do we use, orchestration or the choreography? It's like the typical kind of fight between rest or a graphic all and this and that. So all these things are the circuit breaker, classic pattern, thousands of different types of implementation. So you choose what is the best one for you to kind of put together. So this is kind of the way I kind of see this as a classification. But serverless now cross cuts all these areas across. So that's why I said serverless kind of be part of cloud evolution. But then take things from all over the place, search packages, for example, singleton may not be that popular these days when you write function as a service, for example, but you do have this sort of single purpose functions you're writing, okay? So kind of in a different way that comes into the picture. Okay, so the same thing I mentioned, so serverless patterns so we will go through a few patterns. I think it should be very familiar to you. I'm not coming with any groundbreaking patents or anything. So we will touch upon one or two things with examples. Synchronous will start with a simple synchronous one asynchronous will touch upon functionality is gaining momentum I will talk when we get there and orchestration and choreography contact without those things because they form core of the event driven asynchronous compute model we have these days and then circuit breaker. I will also touch upon the sustainability. I don't go into the details but I'll just give you a flavor of where we are moving towards in terms of sustainability. Okay? So synchronous invocation very simple, have a consumer provider. So consumer is invoking a service, requesting a service is fine, you're done thousands of times, provider is fulfilling the request. The difference is that consumer is waiting, waiting for provider to complete the task and get back a response. Simple. There are so many things in place and where do we use APA calls? Commonly, usually shorter duration because you want to get it done, especially when you have browser or the web apps, you have a customer or someone waiting for the response so you can kind of hang around and do everything else and mostly you get into this sort of a couple of situations. So service to service or you have the knowledge of the service, there's a contract and et cetera, et cetera, same thing, item shipping, sorry, putting something into the basket like I mentioned earlier on the typical ecommerce flow. Now how do we use it? So we have a shopping app, an API, a function behind it with the logic and there is a platform or a third party or a different service behind the scenes fulfilling. So obviously when you drop something in the bosket, bunch of things go behind the scenes like validation. If you are assigned in or registered customers, it will check the age, make sure that you can buy this product, this country, all sorts of things happen. So performance again is a priority logic, valuation. There's no kind of in these situations a partial failure, you say either success or failure, right? You can't simply go in between because customer one doesn't understand any of those things and the validation step is crucial because that's what kind of makes it mostly successful or failure. And the thing is though it looks simple, what if your platform, the third party is on fire, okay? Because that needs to be taken care of. So now you have a broken circuit. We'll talk about the circuit breaker pattern later. So this is something we need to take into account. Even though synchronous pattern looks very simple, it can trip everyone. Okay? Especially in a high volume situation. These are the things that we need to keep in mind when we implement these sort of things. So this is where the circuit breaker or the caching and so many other aspects come into the picture. Okay, now let's switch it further. So you put something in the basket. Now you're ready to make a payment. And then you need to know what are the available payment methods for you. Because it could depend on if you are a customer, you already saved few of your cards for payment. And if you are in a particular country, then the system needs to work out. Okay, what are the eligible payments for GB? And if you are a registered company, have you stored any payments that you can make a quick payment with? So those things. So typically in a situation like this, you would have a checkout service. So that kind of dealing with the front end aspect of the customer going through the checkout flow. Then behind the scenes there's a different micro services completely detached it's doing the payments business. Okay, so it's going to ask the payment service. Okay, here is so and so in this country, et cetera. What are the payment options I should present? So this is actually a thing that we have. The checkout service is a team that I mentioned as Scott and the other one, the payment. There's another score called Brickoin. They focus mainly on the payment APIs or serverless. So the difference is the payment score is almost like the service score APIs, whereas the checkout is full stop. So you have serverless and the front end everything. Anyway, again, performance should be top notch because there is a customer here waiting to make the payment. And this is a service to service call. So extra things come into play. The timeouts between these different APIs needs to be aligned. Okay. Because if this is taking longer and if checkout is kind of timing out faster, then we have a problem. The other thing is quota checks. So in this case within an organization is fine. What if this is provided by your third party? They have limits on how many invocations you can make and per second or per day, et cetera. So these things come into place and also the downtime. Okay, what do we do if it's downtime? Is there a way we can do some caching so that we don't always rely on here? So if it's the same customer who was here a few days ago, there is a cash maybe we can serve for example. So these sort of different aspects come into the picture. So we start with the standard pattern. Then always there is extra bit that we need to put in place. Okay, so asynchronous let's move on to async stuff. So this is quite popular especially with the event driven architecture, right? So pattern is kind of similar. Only difference is that the service provider is kind of carrying on processing the task but the consumer is not waiting. So the client is not waiting. Client is carrying on with its own task, doing something. Because this is sort of situation where you submit something bigger task or something or trigger something on a surveys and you get on with it and later on you will see, you will ask for the status, we will see the flavor with the CQRS pattern. So again, APIs and events come into play. It's not always APIs when it comes to sing or async you can always do a function to function async invocation or event trigger is async all these different aspects come in and in many cases it can be a fire and forget, okay, submit it, you forget it. Then you kind of do some kind of callback or a query to know the progress and in some cases you get acknowledgment. So, for example, the payment service I mentioned previously, so one of the big consumers for that particular service is the back end SAP systems because they do the auto foldment. When they do the auto foldment, that's when they take the payment, that's when they capture or they set in the payment. So every day, three times a day they will submit thousands of payment IDs like 5000 or 6000. So there's an API, there's an endpoint that will receive it and just give back an ID and that will go away and they then process the capturing and the SAP will kind of pull for details. So polling is one mechanism, you might have heard of the web hook. The other one if there is a push notification going back to the client. So that's the other flavor of it. Okay, so this is a filtered events to consumers. I will explain here. So this is an event streaming use case. So on the event producers, the web browser and web apps and so many things. So I think you probably are familiar, most of you must be familiar these days when you go through a website ecommerce platform, website, every action you do is tracked behind the scenes. You click on a product, you put something in the basket, you're making a payment, completing order, you're bandwiding or something. Everything is tracked behind the scenes via event. So this is this event flow. So even flow is getting stored. And then there is a function that will fetch each batch events or thousands of events and it will identify into group into different types of events. And then there are a bunch of consumers they will be interested to know. For example, if you spent a minute or two on a page on ecommerce page, you suddenly get recommendations hey, would you like to see this product? How does that happen? Because this data flows to your big data or data lake or data inside Steam. They do all their logic and math and everything and they will send you back. Okay, this customer show these products. So many things happen behind the scene. Let's see how it works. Let's assume that the events come into a data store, in this case in s three bucket. The dotted line part I will show in a different pattern later. And there is a lambda function that kind of takes all the batch of events. There is thousands of events and kind of splits all the events into different groups and it knows where to send. So behind each target the targets could be the micro service or the third party or different team within the organization, et cetera. So the lambda function that will invoke the lambda functions asynchronously this is not a synchronous lambda invocation, that is an anti patent. In my opinion, never do one lambda to other calls synchronously because your main lambda is waiting until the other one is working on. So you kind of end up paying more. So this is asynchronous invocation it's just more like fire and forgets. So high volume events dynamic on putting. So in our case, the list of targets change or vary. All of a sudden we say we need to kind of send data to so and so third party or internally. So what we do is we have a kind of fan out rule. So this is a simple JSON, this simplified version of it and that is kept in a place and the triage or lambda or the fan out function will basically read the data from here and it will then know which events are to be sent to which consumers. So that way only this part, a new service will come into picture and this will get edited, none of the other things will get changed. So this is a single vacation pattern a different way. So this is another one. So typically as a loyalty customer, if you go to a Lego store and you give your VIP number and you place an order, you kind of redeem points or vouchers and things like that. The order data comes back to us two different places. One of the services will be looking for the data is the loyalty platform. Loyalty platform needs to kind of do the points calculations and updates and things like that. And also you need to send a satisfaction survey to external parties so that you get an email to say that your experience was good or satisfactory, et cetera, et cetera. So for this we have an implementation in place. So this is an auto processing, the one that takes care of the loyalty order. Then once it's processed, this is not kind of client facing, this is background so it can take its own time. That's why it's a decor pulled asynchronous all happening behind the scenes. So once an order is placed, everything is good, it spits an event onto the bus and then it triggers a rule. And this is something called APA destination. If you're familiar with AWS, this is from there. So basically what is represented here is set up of an API endpoint with security credentials and the rate limiting and the rate price and everything. So this is asynchronous in the sense that this doesn't need a response because say for example compared to making a payment, sending an email or something to get the feedback is not that critical. Even if you lose an event or lose warning something, that's fine. So this setup, this is where it just fires and forget it. So other thing I want to stress here is there is no function written here. This is all part of the platform. So this is again coming towards the function listing that I will be talking soon. So rate limiting and secret manager as part of it because the credentials are kept, it's used only when the invocation happens. The problems payload size could be an issue. So your event bus can have a limit on the data size, your API, the destination or the Http end point. The end may have a different requirement. So that kind of needs to be kind of matched. And the other thing is often we have is the debugging or the traceability issues. So that's always the case when we use the platform integration rather than our own code. But that's kind of the trade off that things are getting better. But that's another area that this pattern kind of surprises. So as I mentioned, I returned a few blog posts about these things. So this is coming down to the CQR. So when you finally place an order, for example, there's so many validation happens behind the scenes. One of the main points of validation is the payments. Because payments depending on the provider, the card or the bank, it needs to go through many hoops and hurts to get back to give you a success or not. So if you think of a three DS flow, it has a different kind of challenges and things. So it can take long seconds or even minutes or more. So we can't kind of keep this connection until that point, keep the customer waiting there. So what usually happens is that you take the order and you then kind of decouple that part. So this is the decoupling part. So this is the kind of function that receives and performs the initial validation. It gives back an order number, then it's completely detached. This is a queue that's pushing into a queue. The other reason is like when you are on a sort of high volume sale event, the orders come through in thousands per second, right? Or minutes for example. And we can't process everything in the same speed they come in. So we need to push into some queue to have that buffer then behind the scenes you do the working out. So in this case, this is like the command, okay, here is an order. I'm submitting an order to do something and then what happens is behind the scenes there is a cash table being set up which will keep the status of the things. So then you provide a curie endpoint to get a status of where we are with the order. This is a common pattern that is in place, the polling pattern. One thing to highlight here is that the statuses I think some of you may be familiar with, for example, say the payment failed for some reason, knows back end service knows exactly why it failed, either the Pin wrong or insufficient fund. But back end would never communicate that to the front end. So you can clearly change the visibility of the error because for fraudulent activity reasons like people make use of that and they can vary the amount or change the Pin and do something like that. So that's another beauty of this sort of separation. So the back end service knows it will dog or do things it needs to do in such scenarios. Whereas the status visibility that it provides to the front end is completely different because front end doesn't need to know many of the reasons that happen behind the scenes. Okay, functionless. So functional the term has been around for a few years now. So it kind of started off with when initially people were kind of doing function or lambda function for everything. So I'll put the picture and explain it. So this is typically you will see if there is an API that is behind the API that is a function. I once used the lambda hammer as analogy like those days when this came along architecture, lambda hammer for every problem they hit with a lambda. Okay, we need a lambda function here, but thing is kind of exploded and uncontrollably that many number of lambda function. So then you have a problem because people you start to hear this code is a liability, something you wrote today, legacy, tomorrow, all sorts of things and don't use functions to shift data. This is a bad thing. If you just grabbing something here and dumping over there then obviously there is something missing in your provider service. So this is where the function of movement started. So how can we get rid of functions where we don't need it? So, in this particular case, if this function is not doing any specific logic related things, if it's simply fetching a data from the data store, can we integrate directly with API? So that's one of the kind of the simplest ways to explain the function if you're not familiar with that. So basically what we are doing is using the native integration patterns. So whether it is API or whether it's event bridge or if you work with a jury to even grid and all sorts of things. So how. Can we use the native integration? Because if you use that way, then you don't need to worry about the cold stars. It's a kind of low latency, low cost. You don't pay for the compute for the function, et cetera. But at the same time, that traceability could be an issue because when most of the cases, the thing that happens with the native integration is like a black box. You don't get to see what's going on inside. So that's the other thing that needs to remember. So this is the data integration pipeline that I explained earlier. The quick streaming was flowing through and the service that we use is an API gateway and Kinesis state of ignorance. This is an AWS service that can take millions of events per second. So this is ideal in this particular case. And this part I explained earlier. So this basically will batch events. We can dictate what's the size and frequency, etc. For. But typically you have the lambda function there. But the question is, do we need the lambda function? What is it doing? Is it doing anything logic? Or is it really shifting the data coming from the API payload to fireworks? In that case, you don't need it. So get it off it. There's no need for that. So it's a functionless integration pattern. So, like I mentioned, less code to maintain, but also at the same time we need to think of the drawbacks as well. But this is kind of getting more popular. So behind the CNC API gate will have some kind of script like this. So there is still good, but it's not kind of part of a functional thing, it's something part of the integration set up as well. Okay, so another pattern, another functional one. So order number generation or customer ID generation, candidate ID generation. In many cases, these are all kind of sequence numbers. If you are coming from the traditional relational database world, you know that if you want to generate a sequence number, you know how to do it right. And Oracle provides a way and every other database. But when it comes to serverless, if you work with no sequel databases like DynamoDB, we don't have an option in place as popular as the ones we have with the relational database work. So, to do this, this is an example with the DynamoDB and we can use its atomic number generation. So if you set an API and integrate directly with the DynamoDB, there is no function there. You can do the number generation with a couple of parameters coming in. You can have different tables, not necessarily just one table for everybody. You can have different tables and generate the numbers. So this is kind of the part of the script that will be behind the scene as part of the API gateway integration. So, again, this is DynamoDB script to kind of generate the atomic values orchestration. That's an interesting pattern. So what is an orchestration. So in an orchestration, typically people relate to an orchestra where they have kind of mississians and things there. So somebody's contactor is kind of instructing what to do. So this is typically in a workflow workflow you have number of processes. There's a start, there's a parallel thing or a decision point. So many things happen together before that comes to an end. So it can be long running or even short running depending on the things. In some cases you can pause and resume, not just hit the start button and everything goes through fine. And also you can introduce kind of delays and timers and things like that. So there's always this question of cross service orchestration. If you have two micro services, is it good have one orchestrating resources across multiple services or not? So there's no clear answer. But the general consensus is don't do it. Because if you are in one service and orchestration or workflow in one service, if you're trying to kind of connect with the lambda function in a different service, that is a really anti patent because if things change, you are changing two things, right? So that's why people say that do the kind of orchestration, if at all possible in one service. But now there are ways to kind of distribute. I will talk about that towards the end. But that's kind of the general thing. Recently this came so popular because AWS now has express state function where it lasts for five minutes but you can kind of invoke it multiple millions of times. So whether you shift the logic from a big lambda function to state machine or not so again, there's a trade off because state machine can't get complicated with all your kind of evangelicals conditions and so many things. But there are options there. Okay, so this is a simple one. So session monitoring, I mentioned earlier that someone going to a Lego store and buying something at the checkout till. So we kind of have a session duration of 2 hours within which that should become I think that's more than enough for something to be completed at yourself, right? The reason we have this session based thing is because if they chose to say redeem a voucher or redeem certain loyalty points, we don't know whether the order completed until we receive the order data. But what we do is the loyalty service platform, again, microservices surveillance platform in between you will do is that it will put those things on hold. It will say okay, I'll keep these things so you can't kind of use elsewhere in between, but I'll wait until your order is finished. But what happens is that in some cases orders can get abandoned. So that means you can't kind of hold on to those rewards and stuff forever. So we have a mechanism, we need to have a mechanism. How can we kind of efficiently identify? So what happens is that as part of the loyalty service. When it knows that there's a session going to be started it will send out an event and it will kind of kick off statement workflow. The labels don't represent the actual thing but that's what happened. So that time of the last step will wait for 2 hours and once it's over basically it will say okay, there's no order for this session so I'm going to send out a cleanup event. So the event goes up, it doesn't do anything the state measure workflow. It will just send out an event onto the bus and whichever service that needs to perform the cleanup can subscribe to that event and they will act again. This happens behind the scenes. This is not again customer facing. So we have the kind of the freedom, the luxury to carry out these things asynchronously and decoupled fashion. Okay, so this comes to the cost service task coordination, the thing I mentioned. So we have a micro services here. It has an orchestration. So this is one of the main micro services. It needs other services to fulfill its functionality. Say it lies on two other micro services, b and C. Microsoft B has an API and also has functions whereas C is completely self contained. There's no API or anything. It's a long running process. So API call. That's fine. But if you are making that call like I mentioned earlier, this is probably not a good thing. We don't want to hardware functions across different services or any other similar resources and this is a long running process. What do we do in these situations? So this is where the distributed orchestration comes into picture. So this is a simple view of the loyalty service platform. They have a bunch of microservices. The two things that I want to highlight are the order processing and the vendor mediator. The vendor mediator is basically dealing with the SaaS platform. All the interactions to do with the third party application happen here. None of the services will be aware of this stuff existing here. The reason we isolated is because tomorrow if business decided to change the platform to somebody else only we need to write another Microsoft or modify this one. None of the schema or anything else will change. So that's why they kind of separate. So the problem is as if the auto processing goes through a bunch of things. It requires this one to do two or three things with a third party and on success or failure of this interaction it will decide whether the order is successfully processed or not. So this is the state mission behind the auto processing. A bunch of steps to identify as a VIP nonvip sale return makes all sorts of things but along those lines somewhere it needs to interact with the other one. Okay, update the order to the platform and then this service has no public API. This is completely 100% eventually. So that's kind of architecture that way. So this is where the distributed orchestration comes. So we have two services and tasks. One service then invokes the other one and it will wait for the other one to complete the task and then it will carry on from there. So, if you're familiar with the AWS step functions, they introduced something called task tokens recently. So basically what you can do is you can issue an event or instruction with a token. I can go anywhere, right? It's not kind of targeted to this service. That's a part of the implementation. So those whoever is fetching the event with the token can do carry out the work. It could be an orchestration or it could be a function or anything. And then once it's done, send call back with the token. So the token will go back and then there's a way we can identify. Okay, so now you can proceed. This is kind of a simple representation of the token, the yellow boxes. And once it's done, there's an event that even with that same token goes back. There is somebody listening for the event and then will resume. So it can do for different arms of a particular statement and flow. It's not just one for the entire thing. You can have as many arms and as many weight and resume tokens. We can do as we need. So this is another powerful this gives us a way of distributing this sort of orchestration across different micro services. But you need to be careful in terms of who can subscribe to these events. Because in this sort of situations we are targeting the events to somebody specifically. It's not the typical, I would say event driven, published subscribe. Anybody can do anything because in this case that anybody can do everything because we are kind of targeting to specifically. So if you have a bunch of micro services, we can carefully kind of distribute and get over the problem. Okay? Choreography is the other side. So with choreography it's always related to the sort of event bus and publish subscribers. Because unlike in any dance movement, for example, each one is inspected to carry out their part irrespective of others. So as per the music and the tune goes, they simply perform, right? Nobody is talking to them, nobody's instructing them, they just perform same in this case. So something happens in a service, pushes an event out, okay? It's job done. And whoever else needs to subscribe already subscribe and they will then act based on what needs to be done. So in the previous example, as I showed you, the vendor mediator, as it receives the event, it knows exactly what needs to be done up to the order, et cetera. Similarly, different services, they know exactly what they do. So, a couple of things, the item potency is important because not all event bus implementers guarantee once only delivery could be sometimes duplicates. Go. When I go through the circuit breaker, I will talk about that. And the traceability can be an issue. So one of the ways we do is like an event can carry and envelope with some kind of a unique ID so that you can take to all the way to the different services, wherever the service, that the event that flows to. So this is a feature we developed a while ago. So someone coming onto the site looking for a product, they say it's not available. So this feature is basically register your email so that we will notify you when the product is back in stock. Okay, simple to talk about, but there are quite a few things involved behind the scenes. As a feature, it looks simple, but there are different microservices come into play when we implement. Because there's a registration part, somebody needs to look after the email addresses like PII, GDPR, et cetera, et cetera, campaign. Then somebody needs to notify the customer. And there's a service that constantly working on the status of the products and stuff. And on top you need to generate the data insights and the business analytics. For example, business will be interested in how many suits we have for this product. And also they would like to know how many customers bought the product based on the emails. Because if you're not familiar with the email notification process, as we send out emails, say, Ses or our email provider, you can receive back the feedback if necessary. We can know whether you open the email, where you click the link inside the email, all these things. So all those information come back to give the business the details they need. This is roughly architectural visualization of the solution. Not everything in place, but on one side we have the registration. Its job is just registering, it doesn't do anything. So what we'll do is we'll get the details from the customer, from the browser, set the event onto the bus, it's job done and somebody else is taking care of storing those events. And there's a stock checking one. So that will basically check the stock data continuously and push events. And the notification will be listening for those events coming as a product coming back to stock kind of event types. So then from there it will kind of kick off the event flow. So the email feedback, this is what I mentioned, that all the resources I think AWS supports like nine or ten different types of feedback you can receive, like marking a spam or undelivered and so many things. So they come back and they kind of use for the business insights. So in terms of microservices, so they set up the dotted rectangle. This is simplified view, but that's kind of the way that things are split and another example of how these different services come together and to work in a kind of choreographic way, right? So just a quick one. There's always this debate like I already mentioned. So when do you scoregraphy? When do you use orchestration? So like I mentioned, if at all possible, stick with thinner service if you are doing orchestration. Whereas choreography is like coordinating between different micro services. These are not kind of hard and fast rules, but following the sort of principle will help us to avoid tankling of micro services as you can grow your ecosystem. So circuit breaker so there are hundreds of ways of implementing circuit breaker. So this is a different approach. So when you invoke an API, if everything is good, yeah, it's fine. Circuit is closed when the service is on fire, you can't reach or anything. Service is the circuit is open, nothing can go. So usually typically in a circuit breaker situation, what you will do is you have some kind of a service status monitoring. Okay. So you look at okay, this is green. Is this good? Okay, fine. Is it completely blocked or is it half open? And you keep threshold and some logic behind the scenes. So the thing is, in most cases when you send requests to a different service or a third party, if they are down, you need to resend all those things, right? So you need to take care of okay, how do I get hold of or keep hold of those fail requests and I can replay them. So that's probably the hardest part here. Okay, that's fine. So coming back to the service, the one that the Venture mediator so this SaaS platform, somebody else third party platform. So we can't take it for granted that it's going to be there all the time. Especially this platform, as part of the CLA, it says that it will be down 6 hours per year. Okay, so there you go. So you can't simply assume that will be there. So you need to do kind of three invocations happen. Up to three can happen. One to three can happen for every successful order. So that's why it's important for us. So the way we did, I was using a feature of even Bridge from Amazon. So they have something called archive and replay. So that is what we use just to call out the storage first pattern is something you might be hearing these days. So what happens is with the Asynchronous decoupled way of doing things, the pattern is like when a service receives a request to fulfill something, the first thing it will do is kind of store the details somewhere before acting on anything. So this kind of helps if it needs to replay or come back to the service for some reason. So that's a pattern that you probably will hear now. So when it comes to third party applications, it might be these. There are three cases. So success fine, everything is good, fine. Year harder. If it's something to do with the data, no matter how many times you try and resubmit, it's going to hit the same error. Okay, so that needs a different treatment, that needs kind of business needs to take know the data and place with somebody together. Whereas the retries, whether it's down or time out or the typical service unavailable issues, those needs to be harvested and you need to kind of replay. So what happens is like this particular service, in the back of every order it processes among many events it will send out, these three events go out as well success, error or the retry. So we will be looking for the retries events coming out of the service. And then we set up something called an event archive. So this rule and even coming through this pipeline or this flow, we know that they only get the retry events here. So what happens is kept in archive which is part of the service itself. And what is an archive is basically a collection of events. So you need to set up a filtering and routing rule behind the scenes so it knows, okay, so these events I'm going to keep, you can create multiple archives. So if there are different types of services or events, you can create a different archive so that you're not going to put everything together and making a mess you can target individually. Remember the granularity I talked about earlier? So in addition to these things, there is a set up behind the scenes. Basically looking at the status of that third party application that's based on a crown job. It runs, I don't know, one or two minutes every two minutes hitting their status end points to make sure. So every invocation it updates the details onto the status table and behind the scenes there is a lambda listening for all the status and there's some logic here it will look at, okay, what's the threshold? Threshold I need to now look at. So if the system has been down for long and suddenly it receives event via this update saying oh, hey, third party is up now. Okay, then this will kind of come alive and look at few parameters, say okay, now I'm going to trigger the replay of those events kept in that archive between these two time frames because that the function knows when was the last time it was up or down. So based on that, the events will be played back to the back bus and it will go back to the service and the flow will continue just like normal and if things are failed again, so we'll come back here. This probably may look like a kind of expensive circuit breaker implementation, but in our case because of the reasons I mentioned, the decoupled nature of the services and we just simply couldn't want to kind of scan the table because we don't know how long will be down and how many items will be on the table. If it's like a handful of items, okay, fine, we can do a curry or something. We don't want to kind of curry thousands of items. So in our case, behind the scenes and no sequel DynamoDB. So there are all kinds of trade offs we looked at and we have this kind of implementation in place, things to watch out for. Yes, because the third party may not handle the volume of events all of a sudden coming out of your archive and hitting. So that's why you need to kind of have the throttling mechanisms or the buffering mechanisms in place to make sure that you don't hit the third party a lot. I mean, often people ask, say for example, most websites, many websites these days, logo.com also we have this queueing mechanism during peak sale period. So if you want to buy something, you'll be in a queue. People ask, oh, you're running on serverless and what's the point in kind of having this queue? Can you not scale? It's not for the serverless that we have to protect, it's for all the third party applications that we interact because if they can't cope, then we have a problem. So it's difficult to explain. But the question that comes again and again always because they think that serverless cares up internet. Yes, it can be, but we need to kind of look out for the ecosystem around. So that's the thing. Okay, quickly on sustainability, because this is a hot topic these days and especially as cloud providers started to provide mechanisms of measuring carbon footprint and bunch of things. So everyone is now conscious. But if you look at sustainability itself, it's not specific to the environment. Even though the planet and environment is the focus, it can be anything, right? Because it can be related to a product. We are building a sustainable product. Things like for example, can you extend the product beyond your two sprints? Are you going to kind of start all over again? So that needs to be part of because we kind of gone away from the typical maintenance mode and we are kind of more into sustaining the products. We start with the MVP and kind of iterate and go forward. So we need to think of how can I sustain my product itself, not just do it for once and dump it for maintenance. So those days are gone. And also what are the processes that help in my sustainability journey. And of course the cloud comes in, what do they provide, what we need to take care of. So this is what I'm saying. So that's gone these days, more of sustaining things. How do we operate single micro services? They go out. So for that couple of things we need to take into consideration during the design or architectural thinking. So can it be modular, can I extend it, how can I observe? So these elements are important to build sustainable products, so processes similar. Liz was talking about the Lean principle in another talk recently. So those things come into place, how do I reduce waste? So if you are kind of if you have a very bad value stream and if you're keeping resources running until you're queuing or deploying, that means you are kind of unnecessary burning resources. So these are the things that you need to take into account and also the most importantly, automating everything so that there aren't any manual delays. Everything is automated, things are tied up once the job done, et cetera. So when it comes to cloud, AWS kind of puts out this sort of a responsibility in two ways. It's a shared responsibility. They do their part. I mean, it applies to any cloud provider, not just AWS, right? So they provide their part and we need to do our part as well. So in terms of what they're doing, cloud providers do is renewable energy and they do the resource sharing and all sorts of things. Whereas when it comes to our side so we use managed services, serverless gives a step ahead. But plus we need to kind of get rid of the unwanted resources, data storage, get rid of them, don't keep them because they all kind of in the end of the day consuming power and other resources. So there are a bunch of the patterns they kind of help with how can we improve the situation when it comes to sustainability. Okay? So that coming towards the benefits of patents. So like I mentioned earlier, so this is where first we started. So simple pattern, an API gateway and lambda function. People don't even consider a patent these days because that's there everywhere, right? So that's where it started and from there things moved on. And this one, I show you the knees. Firehose singing this pattern is repeated I think at least three other places. The email feedback, the mechanism I mentioned that has fired behind the scenes because when you send out hundreds of thousands of emails you can expect to come back, the feedback coming back in thousands, right? So that's sort of ingestion pipeline that we have and the development acceleration. So this is one of the earliest data processing pipelines we put in place. So SAP feeds will drop into the bucket, kick off a lambda, they have a queue and lambda hit the platform on the other end. It's all fine. And so the skills allowed us to kind of vary the speed of the flow because in certain product feeds, like product price, that needs to go super fast to the other end. Whereas if a feed is related to somebody's product feedback, okay, so that can slow down and let everything else go. So with SQS probably if you work with Ses, you know that you can kind of vary the delays and visibility, time out and things like that. So those aspects allow us. So the reason I put out this one is that this is where we started. Now we have over 18 pipelines for your software engineering mentality. This is nothing but copy paste of the same pattern over and over again. So this is an example of how these things kind of gives you that momentum. Not for the development team, also for the business to gain that velocity to move forward. So, patterns, if you are in AWS, Jeremy derivative has returned a bunch of things. And there's a CDK, it's an infrastructure score kind of area. It has a bunch of patents as well. And there is a surveillance land. So if you're in a S area, they have tons of things there. You will also get the code samples and all sorts of things. So make use of all those things possible. I'll leave you with this one. I think it's one of years ago he said all the code you write in future business logic. So if you think of the functionless and the integration patterns and things that we are kind of moving towards and getting popular, maybe one day we will just write the business logic and leave everything else to them, the platform and everything else to do the magic. That's all today. Thank you so much. Thank you so much for listening. If anyone interested, I'll be around today until evening.