Video details

Build a Zero Knowledge backend with Node.js | Matthias Dugué | nodejsday 2020

NodeJS
08.31.2021
English

Zero Knowledge Architecture is a pattern that allow to build applications where all data is stored and exchanged in an encrypted way. It enforces end-to-end encryption and client-side operations only. Let's gonna see how to build a Zero Knowledge backend, involved in data transfer and keys-echange system, with a JS client-side lib. No more excuses not to secure your applications design.
nodejsday will be back soon, keep in touch!
www.nodejsday.it Twitter https://twitter.com/nodejsconfit Newsletter: http://bit.ly/grusp-nl

Transcript

Oh, my. So our next speaker is materials. He's a tech evangelist at Always data and he has always made secure came Privacy was one of his challenges for the web. He defined insight as a curious, pet and chronic optimist. Today Maties, we talk us about their knowledge architecture. Welcome. How are you? Yeah, I'm fine. Thank you. I'm really, really happy to be here with you. Thanks. Me too. Me Matias met each other, I think two or three years ago we already talked about this in a previous edition of the day, so I'm very happy to see you back here. So enjoy the talk and see you in a bit. Yes. Thanks for being here for this new edition of not GSD Online. We're going to have a quick talk about what a zero knowledge architecture is and how you can use it in your every development. We can have a look at what zero nal edger character is and how to build a zero an edge back end using not GS. First things first, why using zero architecture and what it is and it proposes it's to protect your users Privacy and your users data flow. We are producing a final users a lot of data and this data transits between client and server back and forth every day continuously. We couldn't take the risk that this data could be stolen or could be misused by any kind of companies that would have access to this data. So we're going to have to find some kind of useful design patterns that help you to protect your users Privacy, because if users can protect themselves, this is our responsibilities as developers to protect them. So we will rely on encryption and I mean a lot of encryption. But fortuity this encryption occurs in the client itself or in the various client that you will use to use your your application and your back end by itself has to be agnostic, which means you can use any kind of technology. Your comfortable ways to build a backhand dedicated to Dero knowledge design pattern or zero know collector. So to help you to build this back end, we will rely on the GS thanks to some advantages bring by the language that ecosystem itself. But to have something really agnostic, we will stick under vulnerable http verbs. The Http protocol defines a lot of actions that you can use to simply put or get or post some kind of data to your server. So this is useless to try to find a new protocol or understand that to help them. To help you to develop a new new zero knowledge architect or zero kwh pattern, you just have to stick and the http verbs and use them in your every action. So what is zero and the later in collector? Exactly. And how does it work? It's a rising pattern that helps you to enforce your users Privacy, which means that we will try to protect any kind of exchange that occurs from the client to the client, whatever the back end is and whatever the data will be transferred between kind of application. So we will use end to end encryption, which means the data will be encrypted in the client transmitted to the server or the back end and then restitute it to the client or transmitted to another client and deciphered on this other client. So you don't have to deal with encryption at all. In your back end it will use estimator teas for each end, especially when you want to share some kind of content. If you don't want to share some content, you can rely on other kind of keys that with sharing involve in your architecture and in your design. You will probably have to rely on asymmetric keys. So for a reminder, an asymmetric key is for each user that can access a kind of data. You will have a pair of keys, a private key and a public key, and the content encrypted using the public key could be deciphered only by the private key that is associated to the public key previously used for the encryption. So if I want to share some kind of content with another user, I will use this user public key and this user will use its own private key to decipher the content that I transmitted to him. This is exactly the concept behind GPG, and this is something really, really useful, but it is something that is really consuming in terms of computing tasks. So we will not use a symmetric keys for the data encryption itself, but only for a symmetric key that will be used to encrypt just a small chunk of data. So each time I want to share some data or store new data on your back end, I will generate a random symmetric key. I will encrypt the data with a symmetric key and I will encrypt the symmetric key with the public key that I will use for the transmission to another user all to myself if I just want to recover my data later. Okay, I hope you are following me in this kind of concept. This is not really really essential to understand what back end zero on back end is and how does it work, but you have to keep in mind that you want to encrypt the data in the client using a key, probably a symmetric key and transmitting the data over an asymmetric wrapping using symmetric keys, and this asymmetric keys will be used in the back end itself. Alright then let's go then. So what are the back end roles in this architecture which is a little bit complex to understand and to put in place first things your back end we have to store and to return the encrypted data which is kind of persistence and kind of storage directly into your data back end. The back end by itself won't encrypt or decrypt any kind of content it will just store and return the data that is passed to it. That's all it will manages public and public keys and tokens that will be used to encrypt data and to secure some exchange between different kinds of clients. So this is also the back end role to manage the distribution of public keys and tokens. It will deliver certificates because we will rely on certificates to host the different keys and different identities of the clients that would be used to access the encrypted data and data change. So it will also manage and distribute the certificates and it will invalidate the expirement payloads because you probably don't want to let access to some kind of data all the time to another clients or to another user. So you probably want to define at some point an expiration time and this expiration is managed by the back end itself, and that's all you don't have to do anything else on your back and side all the rest is on the clock ins. So from the back end point of view, the clients have a workflow. Is this one first new client will register in the backend, getting the root certificate ousted by your backend application. This is the top certificate in the kind of pyramid of authentication and validation between different organization and different users inside of organization. This is exactly the same protocol that is used by TLS, which is the protocols that protects the Http connection in the Https protection. So your back end has a root certificate which is a top certificate, and when a new client registers, then your client will get this certificate to generate some kind of identity on the client side. Then on the client side to client generates a new keeper. Protect them using the application certificate, signing them by the application certificate and will pass the public key and some kind of Ashes like the private key hashes to the backend server to store them on the backend server. So the back end could be able to distribute the public key to the users that want to share some content with this new user. Okay, then your client will encrypt some kind of data using a rapid key like we said before, and we'll find it using it private key, and then put this chain of encrypted data on your back end on the server by just putting it on the seven. Then the server will validate the timestamp inside the put it payload and they will validate the signature before returning a validated payload. Client. This is a basic client back end workflow in the kind of zero one application. So in this context, what exactly are the back end behaviors that we want to use in our drone addict? First mission is authentication. We want to identify users. We want to create new accounts for storage on the back end and we want to link different shares between recipients, which means each time I want to share some content with another user. I have to maintain link for dedicated chunk of data from me to this other user. So this is a back end responsibility to maintain the state of the links. This is the kind of authentication the back end is responsible for. You also have a key management, as we said before, previously dedicated to the back end. So the back end will generate new certificates managed by the root certificates of the backend application. It will store the different keys and optionally. It could be useful to have hardware secure module that is responsible for the cryptographic operation. Plugging the backend server, but this is only if you want to protect your cryptography operation. Backend like says forgive generation, so this is not mandatory on your back end, but you could rely on it. So. How to secure the data we will rely on the JSON web token. Json web tokens are some kind of Jason model that is defined in a the definition has different kinds of data like a UID which correspond to your user some various kinds of parameters. I keys solved parameter is used to encrypt and decrypt your data and your data check and this JSON web token is used to protect your data to sign it and to ensure the data isn't altered isn't tempered between transmission when you transmit the data from your client to the back end. This is why JSON web token is probably the best format to use to manage the exchange between your clients and your back end persistence. We'll be down in documentary Ant database simply because it's probably simpler to have some trees of content rather than a relational database. So we want to use the Commentary tip database in our back end and we have to transmit some kind of operation. Ideally your clients have to be notified where new sharing is available and your back end server don't have to be centralized but distributed in a mesh. So you will probably have to rely on a que messaging solution to Ping different back ends and to rely on activity pub to notify different users in kind of data transmission. And as a useful kind of use of activity pub is that you will simply rely on it to federate your different kind of server backends. So. On the back end, what is the workflow you have to to put in place to have zero knowledge back end application first one, we want to create new users. Those users will be responsible to put some data and retrieve some data on the server that will be encrypted on the client. Then the back end will written uses JSON web token. This web token will contain any kind of information that your clients have to be aware of on the client part, like a certificate that will be used the parameters to use the cipher algorithm to use and so on. Then the back end will handle the posted messages to register new data change each time the clients want to store new data in to the back end. You will simply handle the post request on the back end. You will send out the patch request on the back end because each time you want to post some content, the back end will return some validated data and the client will update data with this return and then past patch. Sorry, the data on the back end itself and the back end. We'll check the signature and expiration time related inside your application. Then the data will be returned or will be set to their clients or the back end will save error codes in terms of in kind of some errors, also in validation in process. Okay, so this is for the etherical part and this is really huge. I know, but we can simply build it using a nags ecosystem and by creating a Nas application. First things. Why not? Yes, because first eat fast. Really. It's a simple and really elegant language. We can use any type of modules that are already available through in PM, so we don't have to develop a lot of logic by ourself, but just aggregate some useful modules or by your it will use a JSON language genetically, which is a language we will rely on thanks to the JSON web token from it. It is perfectly resilient and we have a lot of front end that will be compatible because if you want to develop any kind of clients library, it will be really easy to do it with JavaScript. So you will stick to the same language to develop both your back end and your front end applications if you want. So this is a good reason to stick to no GS to develop the backend, but once again it's perfectly agnostic. So be free to use it as a long edge if you want. So I just have a Pat first margin which will be used is the rectify framework because it's a framework dedicated to build the rest interfaces, which is exactly what we want to do because we want to use the http verbs to generate some kind of content. So we have to build the Rest application rest API and the rest if it is perfectly and to do it we have handled and with pre and less that are some kind of decorators. It handles hypermedia, which will be really useful when you want to link shells between various kind of client. It is versioned by design and really smart at failing because you can easily generate error codes in kind of any errors in your application. So this is perfectly on perfect base. Perfect framework to build our back end. Zero knowledge back end application. Simple setup. We just require rectify. We will simply create a new server. We will have some kind of recipes that were declared like get some kind of URL and just listen at a dedicated Port and so easy, efficient perfect so we have to secure our exchange. The best way to do it is to use. Not for that is a module dedicated to certificates implementation completely compatible with the X 509 protocol that is the protocol use for the certificates like in TLS and it's full TLS native implementation, so that's perfect to handle our certificates logic inside our application. So we will create a new key server on our application. By requiring a node forge. We create a new store using the application root cetificate that will be used to sign all the certificates in the application by itself, which means it user certificates. Then each time I want to create a new user on a user by posting two users and I will create my user and I will return the certificate containing the new certificate associated to the user and associated to the user ID. This is the best response you can give to your application because your clients will just get a certificate in return and everything is inside it so it's ready to use and really to understand what the client side. Then each time I want to post a new users content by putting some new kind of data associated to your user like to log in public key whatever you want. You just have to put on the user ID and you will identify your user and store any kind of application signed up by the user certificate delivered previously when the user register and then return just 204 code. When your user is a digit ends up so that for securing the users identity then you want to store some content. Best way to store age is to use MongoDB or God a documentary of the database. As we said before, we will use a simple documentary structure. This document three will be one document per user. Each document contains the UID of the user, the user name password as signer and salt use for the password, not the password base but on is hash. Thanks the encryption keys. So the public keys and the private key as the various health keys. You will be used in case of the reach proof like the pub keys. If you want to use another asymmetric key pair for authentication and not for encryption. So certificates and a list of items that are some kind of data chunks and for each item dedicated UID solved some parameters, the data itself and the recipient ID that data will be delivered to so it could be another user you want to share some database or it could be the user by itself. In case you just want to back up some data and retrieve it later. About authentication there is two steps. First step is creating the icon by posting the username and password to the platform. The platform will endow some kind of content like username password certificate and so on and will return the GT, the JSON web token that the user will use to protect the data later. So how to do that? Each time we post to server the username and login. We want to protect our data and redirect the data to the east login and each time you want to get the users name, we get the JSON were token associated to the user. So we will use passport to do that because we will use a local strategy based on our database. So we will find new users, find the user in the database in case of the user isn't available. We return an error. If the password isn't good, we return an error. The world will return the password the user by itself and we rely on this passport made were to simply do the authentication inside rectify, which is pretty easy and pretty simple to put in place. A good way to use password is that if you use it for your application, you can rely on so strategies like LDAP strategy or something like that, and you don't have to rely only on a local strategy to do that. Then I want to protect request. We will use TLS. We will use cores to protect the domain request and the JSON web token. So TLS is built in using not force. So for cars we just have to call rectify cars and using the middleware simply by allowing some kind of origins and some kind of headers and just rely on these chords recipe is directly in our rectify endpoints, so that's perfect to use so actions are no identified and predicted and will be with JSON web token. So to do that we rely on rectify JSON web token community module. It's under the middleware. We have another relation passer and each time you want to get a list of items, we will first check the JSON web token using the server secret token, and if the user is not the owner of the items we want or least we just return an error. Otherwise we can return the list. So. What when a user wants to post some content on your back end, first thing the count will derive some keys from the password, then it will post with JSON web token the content on the back end itself, not the encrypted content bent the metadata of the content. It will be the sole use the recipient ID, the algorithm parameters and an optional expiration time stamp. In case you want to expire your data, then the back end will return the updated JSON web token and the item ID, which is prepared to handle data on the back end side. We will use a plugin code body parser. We will simply parse the request deposit request and return the correct created new item ID to the user. Then the client will encrypt the content on its own side, creating a payload with the encrypted data, the rap key, the optional timestamp and the center, and there will be an exception if the data already exists on the back end side. So I've got a UID for an emplacement prepared for my data on the server on the back end. So I will just put my data on the back end. By giving it the requested UID for the item I will find it. I will return the conflict in case of data conflict on this item part. Otherwise I will get the timestamp and the signature. Check the signature check that the timestamp is still validated and return an error in case of failure of the signature. Otherwise save the data about the user by itself. Each time I want to get a new user reference inside it, I will rely on passport. I will rely on GWT token with UID and username and I will use Pamelas for belonging items. So each time I want to get a user content, I want to ask for it. Using the passport identification, I will return a username the public key of my user and a list of items that are new per media list link of my different items associated to this user. Then I will Amber everything every content in Jason web token science and I will pass it to the payload from the back end to my client. When I want to get some data checks and I want to retrieve some kind of data from the back end, I want to have to check the signature using my public key, check the expiration time stamp and then return the data or an exception Karar. So I want to get my collection of items. I will request a specific items. If this item doesn't exist, I return an error in case of this item doesn't belong to me or I'm not authorized to access this item. I return an error in case of the expiration of the item is expired. I return a precondition fail error because the back end won't allow to serve some kind of content that is previously expired. Otherwise I will check the signature check that the content didn't have been tempered on the back end itself by some malicious attack. And if the signature is still a good one, then I will return the content. But as part why using zero knowledge back in. The good thing is sharing the data with other users so I can share my pillow within the user using zero knowledge proof, which means I will put the my check up data into public space to my user recipient. So if I want to share some content with another user, I just put in baskets the data, the encrypted data so they want to share with him. It will return to me. Jason Web can. I will update this JSON web token, write my signed content using my private team and push it directly to the user. Then the user will be flagged and the final user will be able to retrieve my new content directly inside its private space. So it's finally just a simple rest Ping pong between my application, my client application and my banking server. Each time I just I want to create something the server say, OK, you can have something. Here's a token to use the Con side say, okay, I will use it again. I earned crypto content using the chicken and I resend it to the back end and the back end is just storing it. And each time you want to retrieve it, the back end will just perform some checks on the content, no alteration, no tempering, no expiration, and then load the content to be retrieved by the user to be get by the user you want to share the content with. This is a simple rest market, so it's not really complex obviously, but on the back end side, on the front end, it involves a lot of content, mostly on the back end. Just rest API with some kind of crypto sugar just here to protect some kind of exchange. But that's all. So your tools will be the JSON web token to kind of use the understanding of centers and the US tearing of dead time tokens to handle the expiration path and that's all of the rest is on the client side and on the client side it will require a lot of cryptography. And if you want to know more, there is the a few good talks about zero agile architectures, what it is and how to use it in your client application. But for the back end side, that's all you don't have to handle. More content and more complex logic. I'm Matt I'm tech evangelist at Always Data, which is a cloud provider, and as a cloud provider, we are especially concerned about the security of the application our clients host on our infrastructure. And this is why we're evangelism for all these contents better than the renal edge content, cryptography contents and Renal age architecture, because we prefer that our clients understand, how does it work and put in place some kind of protective backends for final users rather than using some risky backends that aren't protected correctly. If you've got any question, this is right now in our Q Amp a session and it will be live just now. Otherwise, thank you very much for being here at this Notes online edition. I hope you want to know a bit more understanding all of this zero allege concerns. And I'm here for your question. Thank you very much. So. Here we are. Thank you for the talk. Matias was very, very cool and I brought down a couple of questions for you, but I love the key. So the only way for covering them. So I guess a question I can ask you while I try to find again the key is is there a way for the back end to access the encrypted data in zero knowledge architecture? Yeah, that's a good question. Your backend can't access data that wasn't encrypted in the client, except these private keys are available on the back end, but in the rematch Tecture, the private keys are as supposed to stay on the client side only. So there is no way for the back end to decipher any kind of data that we just put on the back end side. This is why it's a pretty much secure architecture for your application. You can't access the data itself on the back end, but only on the client side. Okay, interesting. So Luckily I was able to find again the key question. I was wondering what? Look at your talk. So how much the back end know and should know about a user. So let's say you want to log that something has happened, how much you know about the user? How much can you also log about it? So what you should know about that's a good question. You can allow some kind of exchanges. You can log basic actions like login when client or when a user gets access to some content or put new content on the server. But except that you can't really log over access any kind of users data on the back end. There's two sides. The first one is that this is not much information stored on the back end, so it's pretty secure for your user because there is not much information exchange and stored on the back end itself. But if you can access just the logs and access some kind of touch down when a user put content on the server, exchange some kind of content with another user. If you log all the tracks, it can give you some hints on your users activity. So it's a good way if you really want to protect your users Privacy, not to log a lot of information because also made it as informed a lot about your user's activity. Okay. So yes, I I guess a question of trade off. Try to log only what you really, really need to know. Okay, what are the drawbacks of zero knowed architecture, which is a good use case for zero knowledge architecture is a really, really bad case for it. A really good use case is when you want to build some kind of private exchange applications like private messaging application and signal. The Android and and iOS application dedicated to instance is an is a perfect example of a zero rage application. It's a really good use case. A bad use case are not that good use case is if you have to perform a lot of action analysis of your user data because you want to improve their experiences in some way. Some action like performing some kind of big data algorithm on your users data because you can't access the database itself. It's a complete black box for you on your back inside. So for this kind of application, when you want to and rich and you want to build some kind of recommendations based on your users data, this is not a really good part or you can do it. But you can do the recommendations on the client side so you can push some kind of recommendations algorithm to your client and in the client side. The algorithm can be run on your user data and you can get some kind of accommodation for your final user that all the computation is under client side and on the back end. So you can't really know you can be aware about what the recommendations are and what your algorithm I really recommend to your clients. Okay. So you can also do as a follow up of what it just said, which I think is already perfect transfers. Can you do like a hybrid system? So let's take a Twitter. All the tweets are public and handled by the backend, but all the the private messaging are done in a zero knowledge fashion would be possible to go a keyboard solution like this. Yeah, definitely. And probably it could be a really nice solution because you can easily have a mix between public information on private information and private information stays private because it is encrypted in the client side. And this is a good way to prevent any leaks of information private information in case of malicious attack or in case of bird in the background software. It's a bit more complex to implement because you have to implement all the crypto logic on the client side, but yeah, this is definitely definitely possible. Okay, great, nice. Very nice. So given today we talked about serverless. A question comes to mind is is using serverless having implemented everything with serverless. A good thing, given that at the very end the geologic texture. What is really doing is serving chunks of data. Yeah, we can rely on severless architecture because finally, as I said, the derange back end is just a rest API with some crypto stuff just to control. Finally, to control the certificates and to control the expiration time using deck time tokens. So it's totally possible to rely on severless architects use to host and to develop a backend zero knowledge back end. It's probably more simpler because if you have to plan really to implement sensing on the back end, it's probably just the crypto layer, which is really easy to develop some existing modules. So relying on serverless architecture is a perfect use case because you can easily scale up your are using all the Power brand bite by several architectures because on your back end just have to store some kind of data and retrieve it to your user. So it's perfect for a meshed architecture with some some dis centralized services. So yeah, see less ation. I see. So it would be a good option given that especially now the crypto is usually very, very expensive and synchronous, so being able to scale very quickly, it's definitely a good thing to do. Yeah, definitely. And because all the complex crypto actions aren't the clients, they're not did not depend on that, but the power of the back end, so it's easier to scale up. Okay. Okay. Nice. Well, thank you a lot for your time and for your talk. I say was very, very cool. I hope to see you next time in person. And also from the backstage. They told me that the first time we met each other was the not JS Italian. Comp, which is the old name in 2018. So two years ago. Yeah. So see you next time and thank you again. Oh, God.