Video details

Jason Kuhrt - Introduction to Data Modeling with Algebraic Data Types in TypeScript with Alge

TypeScript
09.02.2022
English

Alge is a new library that Jason has developed this summer for working with Algebraic Data Types (ADTs) in TypeScript. It has three capabilities that build upon one another:
1. build one-off records 2. with multiple records build ADTs 3. pattern match against ADTs.
He will demo each of these, sharing thoughts along the way, taking questions, etc.
Jason is a developer passionate about system design, developer experience, static typing, and functional programming. Educated in design theory, practice, and social responsibility, he fell into programming through the portal of open source, Node.js, and GitHub. Over a decade later, he has found himself in love with TypeScript and at Prisma, leading development on the Prisma Data Platform Control Plane. In his personal life, he works on various personal and open source projects, but closest to his heart is the backpacking trips he takes his two boys on across the beautiful rugged Canadian wilderness! 🗻🇨🇦
Connect with Jason: https://twitter.com/JasonKuhrt
This talk has been recorded during the TypeScript Berlin Meetup #9. Join our Meetup group here: https://www.meetup.com/typescript-berlin/

Transcript

Thanks everyone. I have a small presentation about a little library that I wrote in TypeScript this summer. It's pretty casual and the pitch is not to use it. It's about a topic that more of a conceptual topic that I like a lot and maybe it will draw some attention to that topic in your own code in the future. So that's kind of my hope. The library itself is whatever you make of it. We'll see. So the library is called Algae and it's to work with algebraic data types in TypeScript. If you don't know what that is, I will do kind of my interpretation of the topic very quickly on a couple of slides for people that already know everything about algebraic types. Sorry if I get things a little bit wrong there in terminology or whatever, but hopefully what I present makes enough sense to move the topic forward a little bit about myself. My name is Jason, I'm a technique at Prisma. I'm based in Montreal, Canada, and my handle is Jason Kurt, I'm on various social media. So what is an algebraic data type? I'll work with an example to motivate the subject and it's going to be pretty straightforward. I won't go into all the other kind of there's probably a bunch of academic concepts behind it as well, but I'm just going to stick with just one motivating example. So you have some data and the data in this case is a Pin which is for a Sember semantic version. Maybe you're building. I don't know. The NPM website or something and you've got URLs that have sensing them and then you parse them into a data structure and in this data you've got some invariants but mashing it all into a single data structure like this which is I would say fairly common. Especially in JavaScript has some. I would say cons. So what's the problem? The problem is the impossible states that are allowed by this data and that comes through in your code. So now your code will have some possibilities which you might have a comment like not possible or throw error, this shouldn't happen, stuff like that. The problem in this case can be addressed with algebraic data types and you can encode the invariance in the data structure so that might look like this where you want to say the data can come through either as an exact pin or a range pin and the algebraic data type is Pin which is a union of the two possibilities. And in doing this you make certain things clear like the values which before could be here you had range but range would always be undefined if act was true. That kind of ambiguity is removed so branches can be removed with algebraic data types. There's a bit of an art to that. Anytime you have an optional property you have the potential to think about maybe this could be represented or encoded in the data you can go really far and overdo it. So there's sort of an art to data modeling in your domain pointing to just some of the key parts. So you kind of have this upper part, which is the exact, you have this lower part which is the range. So we kind of, like I said, separate those. The key part you add here that isn't represented here is the discriminant property and that could be a key that doesn't really matter. It has to be coming across your members of the union and it has to be some kind of scalar. Usually we would use a string for that and the value itself by convention it would probably just represent the name of the data type. But again, that's not super important technically. The important part is that every member of the union has that key and the key is not the same. And that way when you are in your code and you're dealing with a pin, you can say well, if the pin tag is equal to exact pin, you narrow down that branch. In another part of your code you say oh, if the pin value has a tag of range pin, you narrow down that so you can kind of branch and you don't have this ambiguity of kind of the big soup. Okay, that's all I'm going to say about algebra data types. Hopefully that made some kind of motivation or sense. LG is an attempt to take some of that concept and make it easy to apply in TypeScript. So there'll be some sort of demo life code in a bit, but at a high level. There's three parts to the LG library. There's a builder part, there's a controller part and then there's a matcher part. So the builder part is responsible for defining hey, here's what my ADT is. The controller part is a result of the builder and it's sort of what you use at Runtime to construct your data types, do identity checks to serialize to and from and then your matcher sort of doesn't come from any of this stuff. Rather it kind of composes. So the matcher will take your ADT values that you've created with your controller and just do some pattern matching, which is pretty common in other languages. Like Rest haskell lots of languages, but unfortunately we don't really have a native mechanism. So it's sort of a poor man's pattern matching. Okay, so with that we'll jump in some code. Alright, so the library is on NPM LG like I said, and you import a namespace with a capital A and then you get going from there. So everything kind of comes off that single namespace. Basically there is a couple of parts before we jump right into ADT. There's sort of a lower building block which is just for records. So we talked about how you could separate pin into two sort of distinct concepts the exact pin and the range pin. So we could say those are records and then ADT is when you take them and say this is a union of those things. The record part is sort of the lower building block and so that's what I'll quickly show. And then we'll get to the ADT after which builds on this concept. So in LG you define a record, you give it a name and you can have a schema, but you don't actually need one. And what you get back is this controller bit. So this is the builder and then with the builder you get an API back and so you can say create some data. And so this is using Quoka. It's a Vs code. It's a cool Vs code extension that kind of lets you run code in your editor. So that's what this blue text is. Yeah. So you have this constructor and it's very simple. The underscores and implementation detail doesn't actually show up in the static types. Sorry, you'll see from typescripts point of view and therefore the code that you are writing that interacts with this, you wouldn't access the underscore. So that's to support things like nominal typing, which will get to in a little bit later. But basically by default what you're getting from the constructor is sort of this automation of the underscore tag property. So that's kind of a special property, it's the discriminant property. Now if you want to do something more useful than just an empty record, schemas are what you're going to generally be using and that uses odd. So maybe in the future that would be something that gets expanded to support other things. Although these days I think odds kind of the best out there in terms of like especially TypeScript support. So a lot of algae is just like taking Zod and running with it. There's actually a lot, or rather not a lot of stuff that algae brings to the table. From an implementation point of view, it's quite thin. Zod does a ton of the heavy lifting. So the z is the zod namespace. And I think we had Fabian mentioned Zod in the last talk. So you build your schemas with that, bring the Z namespace in and just whatever. Here we have a circle with the radius and once we have a schema, it becomes a static type error to not pass the radius now and just like a runtime error, which is nice. There's also validation, again, thanks to Zod. So if I come in in here no, that's not going to work. It's not going to work for a different reason. There you go. So this is something you don't get from TypeScript, right? Like TypeScript is totally happy with 1.1, but your business domain is not. And so Zach kind of gives us a bit of that at runtime it's a little bit later. We catch that a bit later in our development process than we might hope. But that's okay. If I write tests and last year by defaults. That's also odd. So you can in here say well I do want a schema but I also don't want to force my consumers to have to supply it. I think there's some sensible defaults for this piece of data. I will start circles with one radius. So you can just pass nothing. You could pass the empty object, you could pass radius of two. There you go. And you can see that if you decide you want to change it, it comes through. If you don't want to change it, it comes through as one. So, pretty basic stuff. That's the constructor kind of aspect that you get out of the box. Record codecs. So we're still talking about records yet, we haven't not talked about ADT yet. There's a second kind of API. So in terms of the builder, we just here had a string and then a schema. But if you want to pass a codec, which is how you serialize to and from your data, there's another API chaining API that has a bit more features basically for more advanced use cases. So here we define the same circle, the same schema, but now we can also put a codec. And so maybe this particular record we want a way to kind of represent it graphically. So we have a to and from and we call that codec graphic. Then once we define it, we get access. So all records have this from and to. So those are the codecs to and from that data type. And then on there we have a built in JSON. I'll get to that in a second. But the one we've added now statically shows up as graphics. We put it up above and in there we can, we can serialize, right? So radius twelve we can take in strings and convert that to our data type. If it's invalid, then we'll get back null the JSON I mentioned that's built in. So you always get that it puts in the structure here. I think the fact that this shows up in there might be a bug. I don't think it's supposed to be there. You can go from JSON, bring it back in and then the ortho so sometimes like union of null like code that doesn't throw is kind of nice because throwing doesn't get captured by the type system. But there's actually quite a few cases where you're in a situation and you just throw the damn thing. Because at this point maybe it's a small script or whatever, throwing is actually what you want. So there's a variant for all codecs automatically. So for example, we called a graphic in our codec but here you would also get the graphic or throw method and so that's not like the return type there. If I capture it, you can see it's. It's always going to be a successful decode because if it's not successful, your program is throwing. If you go back to the other one with the null. You can probably guess what this is going to be. You've got the record or the null, right? It's a union, so you have to deal with that in your code. But if you want to throw it's there okay, record identity. So this is like about typecards, which is awesome that it was talking about tonight. So all the more context for this little part here. You get typecards for free from LG. So here we've added another record. We had the circle, so same one again. But now we have a square. Schema has a size instead of radius and we're going to create some instances. So we'll have some data and then we'll go use the controller and it has helper methods and it has an identity method, is, and also has Is dollar. So with Is here given, it's just a predicate function. True false. It's not very useful for records because the only thing it will accept is a circle. So if I pass a square, that's a static type error. Statically. You'd have to always pass a circle. So why would you check? So this particular part of the API is more useful in the ADT case. We'll see that in a second. Is dollar is just basically looser, so you could pass anything. You could still pass the course of circle. You can also pass the square, you could pass a string. The type of that is it accepts unknown, so it doesn't care what the input is. That can be useful sometimes, but especially in the ADT, the variant of Is, you would prefer the stricter one when you can nominal simulation. Okay, so I mentioned that underscore thing, that's not supposed to be an enumerable property. So it is currently the bug, but it's not supposed to be enumerated just to make it that much more hidden. And one of the reasons we have that is a symbol. So if you think of this sort of ADT system at scale with different packages and things, there's not like from a collision point of view that discriminate property, not a lot of protection around two developers deciding they both want to make circles. And then you can get into some potentially not good situations from kind of a higher level consumption. These things start to mix and you don't expect them to or whatever. And at runtime things pass when you don't expect it. I could choose to ignore that problem. Currently I've tried an idea out to have symbols. So here are two instances of what appear to be the structurally exact same record. But at runtime you'll see circle one and circle two. They're not the same. And that is because it's a simple check. The cool thing with a simple check, I mean you could have done that with the tag the discernment property as well, but it's just fast, right? It's not like a full deep check. Just as like a very quick identity check, like a reference check. So now we get to the ADT part. We've defined again, circle square. But now we want to talk about them together as a shape. So we have a shape, it could be a square, could be a circle. So we're going to look at some ways that we construct ADT. So there's kind of an inline mechanism here that's kind of more abbreviated. You could say it's lightweight. And for many simple use cases it's probably fine. So you say instead of LG, record it's LG data, give it a name, give it some members, some names of records, and then just give it the schemas for each member. So the following three ways of constructing ADT, they're identical. The result of these three blocks here are absolutely the same. They're just different. They just have different interfaces. Here is more composition. So if you already have some records somewhere else from your code base or imported from a library, maybe you could just pass those in and you can build up your shape from those records from somewhere else. There's also a chaining API for, again, the Codex. So there you're still here in this example, defining those schemas in line, just like you can pass them here. You could do something similar there, so that's ADT construction, or ADT is the definition part. So let's play with it a bit. Passing in our circle square, we get back an ADT controller. And now you see we still have these APIs for Socalled square, but they're nested inside of shape. So in shape we get some built in API stuff that we'll look at it in a bit. But you also get for every member of the ADT, that's kind of like a namespace on the top level ADT. And just like before, when we created the constructor, the same concept. So here, up above the circle in square, they have some defaults. So I don't need to pass anything here. If I come in here, say, the radius will be one, right? So it's just exactly like we saw with records, same thing, square. That's when the default for square was size zero. You remember, like before, you couldn't pass square to circle is. Now you can because it's in the context of the ADT. And so the type for Is, it's not so useful, thanks to Esco, but maybe because my font is big, I don't know. But what it accepts here is the full ADT. So all the members, if I try to pass in something that's not part of the ADT, that's a static type error, so that won't fly. But when you have a value in your code flowing that is a member of that ADT, there might be a part in your code where you say, only if it's the circle member of the ADT do I want to continue. These are type cards, the Is dollar, same thing. It accepts anything. So it's a lot looser regarding the narrowing part. So the fact of the type guards. So one example where this is useful out of the box, say you have the shapes array, it could be some circles, it could be some squares. So it's a mix of those two members. It's the full shape unit ADT. But then you want to filter down to just say the circles. And so if you try to do that manually, like something pretty reasonable, right? Like, oh, I'll just go look at that discriminant runtime, no problem solved. But it kind of leaves out the TypeScript part of the equation. So this is sounds from a runtime point of view, but TypeScript doesn't know it's sound. I'm not sure why it works this way, but type guards is how they solve this problem. And this is not a type guard. So you see, in the type circles one, it still thinks that the array could be circle or square. So if you use the type guard and you can just pass that right into filter circles too, it's correctly narrowing the type. So that can be for function, composition and stuff, just kind of quality of life stuff. It's nice to have codex definition. So we saw how codex are for serializing in and out to a record. So same thing, but they kind of have some special powers in the context of an ADT. So we define shape again, circle schema, and we'll give the circle the exact same codec as before we'll define our square and we'll give it also a codec that's like square brackets for that graphic. We'll also give square like another codec called something else. It doesn't matter what it does. So let's see what happens. You can just like with records before, you can access shape, circle to from and you can serialize the stuff. So you get the graphics and you can get back into the data and then nothing different. But it's in the context of the ADT, the square, right? There's a little area this should statically pass doesn't, I don't know why. So just a bug, but run time works. And so same thing as a circle, but the square is doing square stuff. And what's interesting though, because graphic is a codec on every member of the ADT, you can also just do shape from. You can do shape from shape too. You don't have to access a codec for a specific member. You can work with it generically. So let's say you have some code and you have a circle or square and the shape too graphic will accept circle or square, right? So if I got more specific here and said I want my circle to work, that's a static type error because that serializer for that record of the ADT. That's not going to work. It's only going to work with circles. That also should be too graphic. But there you go, there's typer. But when you access the same codec logic but on the ADT level. Except any union could be like a subu. It doesn't have to be the full ADT. It could be like two members and there's like four members in the ADT or whatever, but it will accept some whole or subset of the ADT and it will properly dispatch to the actual member serializer. And then what you get back is right, so there's a math random here, right? So every time this evaluates, sometimes the serializer dispatches to the square, sometimes it dispatches to the circle graphic and that code is more generic. It doesn't have to in your code pick the right serializer. It just works with the EDT, so it's kind of nice. Similarly, there's like the from. So here again the math random, right? It's running and so the circle of square, you don't have to worry about finding the right serializer, it's just at the ADT level it's going to dispatch correctly. So here, sometimes it's the square and sometimes it's the circle. Okay. Last thing about codecs at the ADT level. When there's a member that has a codec that isn't common across all members of the ADT. You could still use it on that member of the ADT. But you're not going to be able to use it. First of all. On other members of the ADT and you're not going to be able to use it as a kind of common codec. So there might be like something that might make sense for some use case, but that's just a thing to know. Almost done. Static types. So this is sort of an area where it starts to be a little bit like the leverage of using a library starts to make a bit more sense because so far we've just seen sort of the runtime and a lot of type inference. But there are times where you need to in your code annotate things manually. You need to be able to refer to a type manually. Maybe some people use prisma here and have had cases where like, prism's type inference is great, but sometimes you need to reach in and like, oh, I really want the input for a particular prism operation. I want to reuse that because I want to abstract prisma but keep the type safety so that can hurt. When you don't have a solution to that, you can end up duplicating types or trying to re implement stuff at the type level. It's not pretty. So the inference here from LG is for kind of that situation maybe more advanced, but I think in any nonrivial code base, some use cases for this kind of thing arise. So LG has at the type level infer and you're just going to up here, we've defined our ADT shape and we raise it to the type level with type of we pass it to infer, get back a shape here, which is like an object at the type level with some properties. You can see what those properties are. Here there's a star and then there's also the name of every member in the ADT. Star is itself a union of all the members. So from there, if we go back to our Type Guard problem, right, that's a case that we can kind of already solve for you because we can ship the Type Guard automatically. But imagine that you yeah, we're going to basically reimplement the Type Guard here. But I think that's just a motivating example. You would use the ones that is shipped with LG, but you could have lots of cases where you do need to annotate manually. As I was saying, we're going to annotate the Type Guard manually with the types that we just deferred. So shapes filter shapes is the array of circles and squares. We're going to say that the anonymous function we pass to filter, it accepts a shape. And now here we're explicitly typing that, right? And we use the star to say that it's any member in the ADT. And then we use this as the Typecard syntax. We say Shape, which is the parameter name is. And then in this particular filter we only want circles. So we access the circle from the inferred type. We do the exact same runtime implementation. But the big difference here is that the TypeScript now understands this as a type card and so LG kind of gives you that power. There's probably all sorts of things you could do with the inferred types in other situations. I personally don't like the syntax. I find it weird. It looks like data access and really what it's trying to be as a namespace. So there's a little like it's a bit more boilerplate for the implementer, for the producer here. But if this is in a library, I would recommend a pattern like this where you define a namespace and then for every member of your ADP to find types, just re access those things. Maybe put like an end that's a bit more of a question mark as to what you would want to call that. And then if we re implement this Type Guard, for me, I would say this looks better, right? Shape any and Shape is shapecircle and it doesn't have this kind of data access appearance, which I find is just harder to read and a bit confusing. Also ADT is probably where this is most useful, but there is also infer record. So if you're just doing the record stuff, you can also get the same basically does this for you. So we just kind of make it for consistency. It's there, but that's basically like a one liner that we just have under the hood using Zod for that pattern matching. So this is the last part. Again, we'll have the same shape that we've been working with and we're going to also define some instances. Circle, square and a shape that could be either one. And this is how pattern matching works. So LG has a top level match function. You pass in your shape and then based on the members of the ADT, you get functions just kind of automatically. So you have here circle or square because that ADT has those defined. So it would be different for different ADT's. And you can do different kinds of matches in the pattern matches. So these are tag matches because all they do is say, well, if it's a circle, you're going this way, and if it's a square, you're going that way. End the story. And then here, this is the exhaustiveness check. So if you've ever done like a switch case statement, then if you really want to be correct, you would have the default at the end and then you would have some kind of never check. And anyway, there's a whole pattern there. But this is just kind of a declarative way that takes care of it for you. So if we kind of run this with the run on the done. There we go. Okay, so here it's got the randomness, right? So it will change as I execute. So sometimes it's got the circle hitting the circle branch. Sometimes it's hitting the and just to look at the static types, the return type is whatever you've implemented. In this case, it's always a string. But if I for some reason turn this into a constant right now it's a union of these two possible string literals. So you get the type safety based. Like that type safety just flows to your code based on your implementation data matches. So they have two parameters. Same thing, though. It's like still circle square so that it still has to flow through. The tag has to match. But now it's also got additional qualifications that the data, the shape of that instance of a circle has to also match radius 13. In this case, for running this, we got an unlucky circle. So we got 1313. Put this down to three. Here we are. So now we got unlucky square for that case and then we got the else case because the circle flowed through, but it wasn't unlucky because it wasn't radius 13. You can mix these. So you could say when a circle is Radius 13, do this when it's any circle. Now you have to make sure that I believe there's a static enforcement of this. Yeah, so it's kind of cool. Like if you put a circle matcher, a tag matcher, you're not going to be able it doesn't make sense, right? Because it's more general. So at runtime, this circle thing is never going to run at runtime. So statically, we just kind of make that API more usable by catching that for you in the editor. Then you could say, okay, square when that's unlike you. Do that. So you can mix and match these things. The reason there's two done is exhaustive. So done. If I don't put square here, that's a static type error because this is no longer exhaustive. Right. If you go to square, that is not 13, you have a case your code doesn't handle anymore. So done is when you really want to express everything that's possible here has been handled. Else is a bit different. Else is like, no, there are some other things that could happen here. And when it is one of these other cases, just do this other thing and that's it. Thanks. I'm just wondering about some of the techniques you use to actually write the library. Is there anything kind of esoteric or strange? Yeah, one thing that's interesting is most of the library is type logic. That's partly because it does a great job at what it does. So I kind of run from a runtime point of view. It's like 200 lines of code, maybe 200 or something. The type logic, like all of this is type logic and there's a lot more than that. So I think what I find interesting is if you kind of have a TypeScript first library approach, the whole at type thing is fine, it's interesting, but I think you think about library design differently when it's TypeScript first. And in this particular case it just so happens that because I have some great primitives, I just wanted to put them together kind of in a pattern. And so I ended up like I think there could be a talk about it. I'm not particularly on the spot ready to sort of point at specific weirdnesses. I kind of also just finn angle a lot with TypeScript and kind of get to where I want and I half the time don't understand how I got there. Which actually does bring up a good point though, is like for the testing, I put a lot of effort into because of not fully understanding why things are behaving the way they are. There's a lot of type like static type testing, so see if I can find a couple of examples. So this is expect type from TSD. There's a couple of other libraries out there that are all good, but this just happens to be the one I'm using. Yeah, sorry. This is like a test suite. There's actually a lot of test coverage because it's like an open source thing and I want to be able to come back in six months and not be like, I forgot how this worked. I don't know. It makes me more motivated when I know it's tested. So in a test case, what is this one doing? This is add custom encoders to the controller. So I guess with codec, what am I doing for a type of A? And I want to make sure that the front so I have at runtime A from string. So that's a codec and I want to make sure that the type of that is going to be something specific and if it doesn't match that even if I'm not testing the Runtime. Which I think I am also testing the Runtime. But I want to catch static type errors in my test suite. So that's been hugely important, I think, for libraries that are TypeScript First, because it's as much of a bug like the Runtime. And the value proposition of a TypeScript first library is that you get this special power at build time if you have a bug in that. I think that's as much kind of contract violation as if you had one at Runtime. Even if it's correctly functioning at Runtime and you only have a static type error, that, to me is still kind of a significant issue with a library. Anyway, that hopefully answers your question a little bit. Maybe the message I'm wondering, it's almost like a program in the test, especially if you have Type function at the type level. And so the way I think about it is it's almost like the writing for the Type program and even stuff like the comments. I want this to fail, and I test that I think this was like only last year or two that this feature because they do ignore. But this thing has changed my life. For testing TypeScript, it's been really fun to use. It's really cool because basically for those that aren't familiar, if this here is not a static type error, then TypeScript will make this a static type error. So I'm guaranteed to like, yeah, this thing is failing when this passes. But that's cool. I totally agree. Cool. Well, it's kind of late. Any other questions? And the colleague of mine is missing also, like, pattern matching from TypeScript? Yes. There's one called TS Match or something like this, and it looks quite extensive. It focuses just on pattern matching and it's got this big README. So that's what I'm aware of. And there's probably more. I think the pattern matching I have is pretty minimal and it kind of plays very closely with like I mean, you could kind of throw in your own ADT if you kind of made it. You could kind of define this tag thing. You could kind of throw it in. It would work. But I haven't thought about that. So the intention behind it's just like a minimal little add on for LG, but know if it's any better than what's already out there or anything. Cool. Where is Italia? Are we done? Natalia.