Video details

Dealing with persistence in serverless applications - Marc Duiker

Serverless
07.17.2022
English

Serverless means stateless, right? But surely serverless applications need to persist some state somewhere. What are the right choices for storage solutions that fit a serverless architecture? Join this session to learn about various options in the Azure cloud.

Transcript

Okay. Hi service friends. It's nice to be on a real stage again. Super. I'm going to talk about persistence in service applications and in this case specific. We're talking about different persistence options in the Azure cloud because I'm going to talk about Azure functions, quite relevant here since we're in Microsoft reactor here. My name is Mark Daiker. I'm a senior developer advocate for Ably and Abel enables developers to build live and collaborative experiences. If that sounds too abstract, you can just see us as a WebSockets, as a service company. I'm a Micro MVP, quite active in the service community. I'm cognizant of servers days. Amsterdam So I'm hoping we're also going to do one of these real live conferences again pretty soon. And last but not least, I would like to drop Pixar. So this is our death row team of Ably. Maybe you've seen Vs code pets, digital pets in Vs code. I've made most of them. It's lovely, lovely stuff. But today we're talking about persistence and mostly about these four types of storing data in Azure. It's going to be quite quick, quite demo heavy. I'm going to show you source code actually running the functions and storing data. So don't expect really in depth comparisons between this function or disservice and debt servers. No, it's going to be really quick. Some source code. If you want more information afterwards, then please come see me. If you do have a bit of time. I have some slides about data distribution, which is actually also a form of persistence. But let's get going first, a definition of persistence. So in computer science, persistence refers to the characteristic of state of a system that outlives the process that created it. So if you translate that to servers applications, which usually looks like this, we have many different function apps or functions. They usually communicate with queues, which is already a form of persistence. And usually there's some storage involved. And if you see all of these chains as individual business functionalities, usually each functionality has a separate needs of storage, right? One could be Blob, one could be Table, one could be Cosmos, et cetera. So there's not much sense in just choosing one type of storage in the cloud for everything. No, you should really look into your business problems and pick the right kind of storage. That solves that for you. Now going back to Azure functions in specific and let's have a look at the anatomy of an Azure function. Like already mentioned before by Jeremy. So it's event driven, so there is always a trigger for a function, can be an HTP, can be a queue, a Blob, anything. But the nice thing of Azure function is apart from the trigger, there's also configurable inputs and outputs which have nothing to do with the trigger itself. So these things are called bindings and usually configure them by using attributes on your functions. So you can have an Http trigger, but you can have an input binding related to Blob, for instance. And the same goes for outputs. You can have an Http trigger, but you can have an output binding related to Colford DB. And using these bindings, it's really a very quick and powerful way to get data in or out of your function. So, like I mentioned, I'm going to show you four options which I think are very suitable for surface functions. There's definitely more in the Azure space, of course, about how to persist data, but these four I've used quite a lot over the last couple of years, and I'm very happy with these. All right, let's first talk about Blob storage. Well, why would you want to use Blob storage if you're dealing with large amounts of files? Basically because unstructured data means files. So it could be photos, videos, audio files, log files, could be complete virtual machine images that you want to store in Blob that's perfectly suitable for Blob storage straight to the demo. All right, I'm using Vs code here. I've already got my function app running. So that's what I really like about Azure functions. It comes with this local function runtimes. You can develop and do stuff locally completely. I know Taylor has already left, but I really do like this in a developer loop of doing stuff locally before you deploy to the cloud. So it's up and running storage wise. I'm also using Azure, right, which is a cross platform storage emulator that allows you to do emulation for block storage, table storage, and queue storage. So that's running in the background already. Last but not least, I am using Code Tour, another extension in Vs code, to guide us through the solution. Because the solution is really quite big. I will give you a link later, but it contains like 15 or 16 different functions. I'm only going to show you four, and this helps me for the flow. But if you're going to cologne this repository later, you can start the same code tour and have this guided tour through the solution yourself. Okay, this democracy, persistence, it's fine. Going to go next. So this is the function app here. We see it right here. It's divided into different folders where our functions lives, and sometimes there's even a further division between input bindings and output bindings. I'm only going to focus on some of the output bindings. There's also a test folder, and this test folder contains some Http files, and those contain the endpoints of the Http triggered functions. And that I'm going to run. And since I'm using yet another Vs code extension called the Rest Client, I can just simply execute these Rest endpoints from within Visual Studio to test these functions. Okay, first we're going to focus on Blob output binding. So this is the Http file to test my Blob function. So the first thing I'm going to execute is called Store Player with string blob output. So that really gives a bit of how things are going to work. So we'll do a post to this endpoint and the data that we'll be supplying is down here. So I'll provide an ID, I'll provide a name, an email and a region. And the context behind all of these functions is that I'm going to put some Player information into storage. So I'm not going to call you. I'm first going to show you the elements of the function and then we're going to execute this endpoint. So next up is the Player class app. So this is our JSON representation of the player, but we also have a net representation of the player. So it's quite a regular net class. You can see here the ID name, email and region things. Okay, let's have a look at the function itself. It is here. So this is a function definition. It's triggered by an Http call. And one thing to note here is that we can use our own types here. So instead of using an Http request object which is built into the system, we can actually use our own defined type. So we can immediately use Player as our input type here, which is quite nice, I think. So Azure functions will take care of the syrization between the request and this type. Then how do we get this into Blob storage? Well, this is the Blob output binding right here. It only needs two things. It needs a path to the Blob resource. So in this case the path is made up of three different sections. So there's first the Players part and that's the Blob container. So in Blob the hierarchy is you have a it's part of storage account. Then you have Blobs and inside blocks you have containers which are your folders. Then inside that you can have kind of a virtual folders. So there's an outlist virtual folder. And then the last part is the file name. In this case the file name is prefix with string dash. And then there's an expression ID. And this ID refers to the ID that comes in via the Http request because ID was one of the properties in the request. So we can actually define the file name based on stuff that comes in in our request. So this is quite a useful thing to properly name our files. The second thing we need is we need the proper access. So in this case we defined it. We need write access because we want to write to Blob. So how do we get this data in there? Well, we are going to use an output of type string and that's named player Blob. So what this function needs to do, it needs to assign a value to this player Blob output and then the binding will take care of the rest. And that's happening here a bit further down in the function. So we actually go to serialize our Player object and we assign it to this player Blob and we return an Http code two or two accepted back to whoever is calling this function and that's it. So let's execute this now and before we execute it, we'll show you where it ends up because otherwise I could have cheated and of course put something in the lobster already. So there's now only a virtual inflow but no virtual outflower. So there's no data yet. So that's good. So now let's execute this. We get back a two to accept that. So that means there should be some data in the table. Now, we don't see it yet, but if I hit refresh yes. So here's the output folder and this is our file. So that worked. So let me double click it to open it first need to download it and here we go. So this was our request and it's our output. So that's working right with just a very minimal amount of code. All right, so that's one. Quick over to the next table storage. Table sources used for large amounts of structured data. Well, what does that mean? If you have more like NoSQL data, you can consider it to be like key attribute data. In this case we have the key consists of actually two parts. You have a partition key and a row key and the combination of those two should be unique for all of the records you have in table storage. Then you have up to, I think, 250 custom attributes that you can use. So you can really put quite a lot of stuff in each record. The record is limited to, I think, one megabytes per record, but still, I think it is fairly large to use. You probably know Troy Hunt, who the creator of have been Pond. He's using a lot of table storage for his back end of Have Been Pond. He also wrote quite a lot of blog posts around it. All right, so let's go to table output demo now. Let's close this. So we're now in the Http file to actually trigger our function. And what we're now going to provide as a payload is not just one player but an array of players. So we're going to submit three players and we want those three players to end up in the table storage. So let's have a look at the function. So this is the function itself and again it's an Http trigger. And now the input type is of player entity but an array of player entity. So it's important to know it's not the same as before. It's not player but it's player entity. Well, why is that? Let's have a look at the player entity. We need to derive it from an interface that makes it compatible with table storage. So like I mentioned before, it's a key attribute table and the keys need to be present, right? So there needs to be some properties need to be there, otherwise we cannot store it in a table. So we need to have this partition key and this row key. And we also need actually a timestamp it will set it automatically. But we need these three elements because otherwise it's not a fed table entity. All right, so here's the table output binding. We only need one thing, and that's the name of the table. And the name of the table is called players. So that's quite obvious. Then this thing interesting that it's an Iacyc collector of type player entity. So what this thing is, is just sort of a bucket. It's a collector buckets. And we need to add things to this bucket inside our function. And when the function completes, this binding will take care of actually flushing everything that's in the bucket to table storage. So adding it is done like this. We are iterating over our player entities array. We need to set the key set because in my request, there's only an ID and a name and region and stuff like that. But there's nothing of keys in there. So just make sure that the partition and row keys are set. So we satisfy the conditions for table storage and then we say to the collector, okay, add this player entity to this collector and that's it. And we give back two or two results in the end. So let's try this. Going to do a post here. It says to accept it. So that means we should have stuff in our table, which I failed, actually, to show you up front. But you'll have to believe me that no data was there before. So these are our records. And as you can see, I've used the region as the partition key here because the region is also in here. So the region is the partition key. And I've used the player ID as the row key. So the player ID is already unique. But the requirement for table storage is that the combination of partition key and Roku should be unique. So it works perfect. Let's go back. All right, close this. Okay, next up is Cosmos DB. Well, this is really quite a step up because Blob source is, let's say, relatively simple table source. Also relatively simple, still very powerful. But Cosmos DB is really a beast of a database. So it's a NoSQL database. So probably the concept is quite, very well known, but it allows you to distribute your data globally with just a very few clicks. It has, I think, up to five different configurable consistency levels for your data, which is really quite powerful. It does auto scaling for you. And it used to be like fixed request units. So the scale was sort of fixing it to provision it first. But now it also has a surface option, also mentioned by Jeremy. And it allows you to use different APIs. So I'll be using the Core or SQL API, but you can also talk with the Cassandra API or Gremlin API based on based on your needs. So even though this is really a monster of a database, it's still very easy to actually get started with this, as I'm going to show you. All right, moving over to Cosmos. So this is the input I'm going to trigger soon. Store player return attribute, Cosmos output. And we're just going to put one player in there. So let's have a look at the function. So this is the function, and it looks a bit different than the rest because right beneath the function name attribute, which defines the function, there's immediately this return attributes, and that contains the Cosmos DB output. So this is a very weird way of specifying a return value, but it is also very efficient and I think also clean. But yeah, when I saw this, I thought, what a weird way to specify a return or output binding like this. The Cosmos DB binding itself needs three things. We need to define the name of the database called KDB. We need to specify the collection name so you can see that as a table name, and it needs a connection string setting. And a long time ago, when I just started out with Cos DB, I thought this should contain the actual connection string to Cos D. But no, this is the setting name in your local settings, Jason. So if you quickly go there. So this is the setting name in my local app settings, and that contains the endpoint to the Azure DB. So I'm actually going to put the record in Azure Cosmos, DB. There is also a local emulator for Cosmos DB, but I'm working on a Mac right now, and normally I work on Windows, and I wasn't able to get this emulator up and running on Mac, so apologies, but it is there. All right, so this is the entire body of the function. So that's interesting. So we have an Http trigger. A player object comes in, but we immediately return that player object, and that's it. So this is like the shortest function ever, I think. So if we talk about low code, I think this is definitely low code. The only requirement for putting stuff in Cosmos DB is that your entity or your object hasn't ID. And even if it doesn't have an ID property by itself, it will add one. But our player has an ID property, so it will use that property for indexing. Okay, I'm going to trigger this first. I will show you. So this is the DB extension for fuse code. Let me actually refresh this. So we know there's no data in there at the moment. So this is syncing with Azure. So it's a player's collection, and there's no documents in there at the moment. All right, so let's trigger this. Okay, so we can pack a $200. It's fine. We don't see here, but we need to refresh this, and we should be able to see a document named Ada in here now. And there we go. So these are documents that we push to document DB which is running in West Europe and it was still pretty quick executive here from the US. So that's nice. Okay, so that's also working. Cool. The final example I'm going to show you is durable Entities which is a whole new thing on its own. So it is not an Azure service. Durable Entities is part of durable functions and Durable Functions is an extension that you can add to Azure functions. You can just add it as a package. It's now in NuGet package that we just can add to our solution. And Dupa functions is mostly used for creating orchestrations in code. So you can do long running workflows. It's really like a super flexible thing. You can do function chaining and fan in fan out. It's comparable to two step functions in AWS. But this is really super good. I think it's step functions plus plus to be honest. So part of this dual functions is then dual entities and those are state full entities which of course is very weird because functions are always stateless. But these are stateful entities. So how does this work? The state needs to be stored somewhere, right? It cannot be in the function itself on the process that runs the function. That's because Durable Functions is built or wrapped around and uses an Azure storage account and it uses queues and tables and blobs under the hood. But that's completely abstracted away from you. So you don't have to interact with the storage account API yourself. Now you just have to know a few commands in the durable functions API and it will handle the rest for you. All right, let's have a look at durable entities. Now, so this example is slightly different. We're not talking about a player object and anymore. But what we're going to do is we want to update a player score. So I'll trigger two different endpoints. The first one we are going to do is we go to a get of the update player score and just provide a player name and that will retrieve us the current player score for the player. And the other one we are going to execute is we're going to do a post and provide the player name at a number of points and that should increment the high score for that player. So now let's have a look at the Http trigger function. So this is not yet the entity function, but this is an Http client function that will interact with the entity function. So what we see here, the only thing that is special about this is that we have specified a route and the route is update player score and then the player name and slash points. And the slash points is an optional path, part of the path. The other thing we need in order to interact with entities is the journal client. This comes with the Jura Functions nugget package that we have installed and this client allows us to communicate with our global entity. Well, how does the communication look like? Before you can do any communication with entities, you need to define an entity ID. And this entity ID is always a combination of the name of the entity function. In this case, it's the name of the player score and an entity key. And the entity key needs to be unique for this entity. In this case, I'm using Player Name, which is obviously not going to be very unique if you want to run this in production. But for this demo purposes, it's enough. So what's Player score? Well, play a score that is the actual entity function and you can create entity function in some different ways, either class based or function based. And this is a class based entity function. So it's like any other net class. As you can see here, it has a property called Player Score. And this is actually what gets persisted in table storage later on. The point is, we cannot really set this high score property directly from our Http trigger function. We need to use methods and that's because we are using some generated proxies to interact with this entity. So we need to define this method and that will actually increment this high score. At the very bottom of this class, there is the actual definition of this entity function. So there's a function name and we are using the entity trigger which is then identified as this being an entity function. All right, so let's now regret oh no, not yet. Because we're going to talk to it, right? So how are we going to talk to it? So the first step is if you do a post well then we are going to update that high score. So we want to call that Add method and that's done via Signal entities. There's a method on the global client named signal entity Async and you provide the interface there so it can discover what kind of methods are on there. And here we are doing add and adding the value of the high score points. So signaling is a one way communication. It's a firing forget, we just call it in. The dual framework is responsible for actually executing that command if you want to read something. So this is the stuff that will happen in the Get request. Then we are going to use the method called read entitystate async providing that entity and also the entity ID. Again, we get back an entity response which is not only the state, but it's just a response. We first need to check if that entity actually exists because we could have given it an entity ID that does not exist. If it does exist, then we can actually use the entity state which then contains the properties we've defined in the Player score. All right, now it's time to actually call this. So I'm just going to do a get so you can see what the current state is because I've been running this demo, of course, a couple of times, so it really has a state. So let's send a request and we get back that Grace has a high score of 200 points. All right? So now let's do the post and that should increment the state with another 50 points. So let's do that. So it now says added 50 points, but this only does a signaling call so we don't get the state back. So if I want to retrieve the state, we can do a get again. Now you can see that creates a high score of 250 points. So even though we are using this stateless Http triggered function, the state is captured in the Player Score function. So it's still somewhere in storage, right? So let me show you where it is. So if I go to table storage and persistence demo instances, here is the entity called Add player Score add Grace. So this is the combination of the name of the entity and the key. And this actually if I double click it, yes. So this contains the state. You can see right here, this contains the value of 250. Cool. All right, so those were four quick examples how to use output bindings for these types of storage. This went all really fast, but definitely if you want to have detailed look, have a look at this repo. Like I mentioned, it contains like 15 or 16 different functions, both input and output bindings for all these four types. It's all based on net, six in process in this case. So definitely worth to have a look there. If I have a bit of more time, I'll just show you two more slides about distribution instead of persistence. Because the thing I showed you before is if you want to move some data between functions, usually queues are used and that's also a sort of persistence, right? Because as soon as you put a message on a queue, then function A can just go away completely, right? It doesn't need to live anymore. And then function B is triggered based on this queue. So this is also a form of persistence. Won't go into any code here, but useful services in Azure or storage queues and Service Bus for instance, another form of data distribution and also persistence. But if you want to distribute data between clients or between servers and clients, that also happens quite often. And then some answers are printed. Signal R and Azure web pops up in Azure or a bleep. Last but not least. If you want to see something real life that you can actually play and can touch. But also want to have a look at the source code. You can look at this sample quest AV so this is backed by Azure function that's using durable entities for the game state and the player scores. Just like you saw in these small samples I've shown you earlier. And it uses a lie for the real time communication. So if you go to the page at the bottom, there's a link to the GitHub repo. Everything is open source, so you can have a look at the implementation there. Thanks, everyone.