Video details

Domain Driven Design Patterns in Python

Python
07.16.2021
English

Domain-Driven Design (DDD) is an approach to software development that emphasises high-fidelity modelling of the problem domain, and which uses a software implementation of the domain model as a foundation for system design. This approach helps organize and minimize the essential complexity of your software.
DDD has been used with success within the traditional enterprise programming ecosystems of Java and .NET, but has seen only limited adoption in the Python community.
In this talk we introduce Python programmers to the core tactical patterns of DDD and show how they can be realised in idiomatic Python, freeing the most valuable parts of your system – the domain model – from onerous dependencies on particular databases or application frameworks.
In this talk we share what we've learned from applying DDD in Python to large projects.
PUBLICATION PERMISSIONS: Original video was published with the Creative Commons Attribution license (reuse allowed). Link: https://www.youtube.com/watch?v=Ru2T4fu3bGQ

Transcript

There's a lot of content in this talk, and you'll see why. In a moment I'm not sure we're going to get through all of it, but I'm going to rattle through it and you can come and talk to me afterwards. If you need clarification on anything I talk about, I'm going to start with a quick introduction of DVD management design. What is it anyway? Then I'm going to move on to some strategic DDD topics, ubiquitous languages and bounded contexts. Then we're going to go to the other end of the scale and talk about the smallest parts, the smallest, simplest patterns in domain driven design, value objects, and entity objects. Then we're going to move on to aggregates, which is a topic that some people find very confusing at first when they first get into DDD, then I'm going to shift gears and talk about software architecture and how the architecture of an overall application can be centered on a domain model. Then if we have time, I'm going to talk about repositories and domain events. So let's get back to the program and think about, well, what is DDD? A bit of a health warning, though. There are no answers in software design or modeling, but there are plenty of choices to make. And whenever we design a software system or design a domain model, it's all about making choices, and there aren't necessarily any right answers. So I'm not going to give you right answers today. Discovering good answers for your context is something that's very much up to you. What we've noticed is that domain driven design is often encountered in classic enterprise computing projects, which are nearly all written on the JVM stack in Java or on the Net stack in C sharp. And some of the languages have picked up quite a big DVD community like F sharp, for example. But we see relatively little DDD activity in Python, which is a shame because Python is a great fit for DDD. There's nothing that is happening in DDD that we cannot implement in Python. Domain driven design is about distilling a problem domain down to its essence, and Python allows us to produce very low ceremony solutions, and so they're actually excellent fit to each other. So we're going to talk about how to bring these things together, and hopefully you can get inspiration from this and go away and think about using DDD in your own projects. So one approach to learning DVD is to go out and buy the original book by Eric Evans, which is what 15 years old now from 2003. It's a very heavy blue book. It's almost completely indigestible unless you've already been through a DVD project. So this is not the right place to learn DDD. It's a great book to read when you've been through a couple of DDD projects, ideally one of which has failed, and then you will understand what Eric Evans was talking about. So this is definitely a book that you will come to in your journey with DDD, but it is not a good starting point later in the talk will refer you to some much better starting points for getting to grips with DVD. Dvd is a philosophy for bringing software closer to the problem domain. We are coming up with software solutions that are driven by the problem rather than driven by technology, by involving domain experts. And it's a systematic approach which involves some strategic design practices. I'm not going to really talk about those today. A bunch of principles and some guidelines, rules and patterns, which I largely am going to talk about today. Ddd is not about specific technologies. It's very largely technology agnostic technology is, of course important, but it isn't the driver. It's not a heavyweight method. It doesn't imply that we need to do waterfall and do all of our design up front. It's completely possible to use it in an agile context, and also it's not always appropriate. Anybody who stands up at a conference and tells you that you're doing it wrong unless is a charlatan. All the advice you get at software conferences is context dependent, and there are plenty of context in which DDD is not appropriate. So you should be aware of the context in which DDD works well and be prepared to use it in those contexts, but be prepared to search for other types of solutions in other contexts. So let's take some pictures from Eric Evans, the father of DDD, if you like. This is very difficult to read. So we'll Zoom in on part of it. Actually, we'll go back and Zoom in onto this part. So the practices and patterns of DDD are really broadly divided into two areas strategic DVD, which have a broader context. They're very influential on how a project turns out or how a product turns out, but they are not really related to the kinds of decisions you're making every day. As a program. As a developer, I'm going to briefly talk about some of these particularly bounded context and ubiquitous language. And then there is the tactical DDD patterns, which are these are the things that as a developer, as a programmer, you're dealing with day in, day out, you're creating entities, you're creating value objects, you're working with aggregates. So as programmers, these are the things that you'll interact with much more frequently. So before we get into the tactical aspects, let's just quickly look through ubiquitous language and bounded contexts. So one definition of domain driven design by Vaughn Vernon is that DDD is primarily about modeling a ubiquitous language in an explicitly bounded context. Now that is actually a very deep statement, and it makes a lot of sense to me because I've been doing this for a long time, but I'm going to try and unpack that for you and what it means. Now, all domains have a language which people working in that domain used to talk about the things in that domain, including DDD. This is a kind of meta ubiquitous language. These are the words we use to talk about. When I'm giving this talk, you'll hear me words that use words like domain and design and language and system and context. Because this is the ubiquitous language of DDD. Any domain that you're working in will have its own language, and the experts working in that domain will have their own language. And one of the key things we need to do in a DDD project is to learn that language. And rather than imposing our own language on the domain, take the language from the problem domain into our solution. So this is called ubiquitous language and ubiquitous meaning to use everywhere or being everywhere. So one of the key objectives of DDD is to create a supple, knowledge rich design which calls for a versatile, shared team language. And that's not just shared amongst developers that's shared amongst everybody involved and lively experimentation with the language. It takes time to discover a language. You won't get it right first time, you need to experiment and discover and refine and be prepared to change decisions you've made earlier. Ubiquitous language is context dependent if you're doing fighter aircraft Avionics, that's going to have a very different ubiquitous language to sending out people's utility bills. A ubiquitous language is valid within a context within a limit of applicability, and that is called the bounded context. I'll come on to that in a moment she's used by a community of domain experts as a shared meaning. It's unambiguous as a single meaning and it's expressed, and this is really important. The language is expressed in the software model. We use those words when we're writing the software, and this helps reduce the distance between the problem domain and the solution domain. And because the solution is very close to the problem, any change that happens in the language the domain will evolve over time should be reflected as a change in the software as a change in the model. So we need to strive to use the ubiquitous language when we're naming things in code, when we're naming things in databases, when we're naming things in tests, a banded context is the limit of applicability of a ubiquitous language, and many larger domains will have multiple bounded contexts within them. If you work for an energy trading company, they'll be an energy trading bounded context. But there might be some kind of systems administration boundary context where different things are going on and different languages apply in those areas. So it's very easy to come up with words which mean different things in different contexts. Like the example I have on the slide. One of the failure modes of DDD projects for people who are new to it is that they tried to model everything as a single grand unified model rather than many smaller models with narrower scopes of bounded context. So when you're modeling, you should definitely exhibit a bias towards separate smaller models rather than one grand unified model of your domain. Because if you build a grand unified model of the domain, the larger domain, it will be suboptimal for everybody, and nobody will have a good time working with it. And you'll spend a lot of time wondering why the things that are going on in the real world don't map very neatly onto the things that you've expressed in your software. So models will exist within bounded context. You should prefer to have multiple bounded contexts for anything other than trivial domains. The context is the scope of the language. We have separate segregated models and independent autonomous implementations. You don't even have to write them all in Python, right? Maybe another language is appropriate for some of these models, and ultimately these bounded context of language will align with technical components. If you identify three bounded context in your domains, it's a really good idea to have three systems. If you try to map that onto two systems or four systems, you're setting up yourself for difficulty. So ideally they should align with technical components and be loosely coupled. And then you're going to have to do some work which I'm not going to talk about further today. In context mapping in terms of how concepts in one domain map to another and the context map is both a conceptual thing, but ultimately it needs to be a piece of software is able to move information and share information between these domains. Mapping from concepts in one domain to concepts in another. For example, you might have a user in one rounded context, but that might be a customer in another rounded context. They are clearly very related concepts, but you might wish to model them differently. Let's go back to the other end and look at value objects. These are just about the simplest things that we the simplest pattern in tactical domain driven design and value objects are used to measure and describe things in the domain. So we're trying to measure or describe characteristics like a currency value or a telephone number. Equivalent values are interchangeable. We don't actually care about the identity of the object. When we're dealing with value objects, we care about the value of the object. They're comparable by value, and they lack an intrinsic identity. If we have a string which contains someone's name and another string which contains the same name, the fact that they're equal is fine. The fact that they are actually different string objects is irrelevant. They are a name, they're self contained, and normally they are encapsulated and owned by entities, which we'll come on to next. It's very good practice to make your value objects immutable, which means the instances of them can be shared. So we have some really nice value objects in Python already things like date, time and string. They are immutable and shareable. Generally speaking, though, you should avoid using neat value objects like integer or float or string. You should try to build some kind of domain specific abstraction around those types. The reality is that your actual problem domain does not have things in it called string and does not have things in it called integer. It might have something in it like a currency amount or an amount of money or a duration. Yeah, so strive to make them immutable. Easy way of doing that is to lean heavily on the built in immutable types in Python Tuple, frozen set, struct, et cetera. Validate them in the initializer Dunder in it or Dunder new. You might be using Dunder new here because you can share instances because they're immutable. You can do interning getters with property. Remember to override the equality operator in Python three. I don't think any longer need to override the inequality operator. You get that for free, giving them a useful Dunder reper. Think about whether rich comparisons are necessary to implement for that time. Quite often they are considered preventing mutation by overriding Dunder set at all to prevent people mutating the object and of course always give your types a useful dentistrat and consider a dental format as well. It's quite often useful for many of these value types and use side effect free functions to produce new values. So if you do need to modify a value object rather than having a mutating method on that, have a method which returns a new value object with a modified value. So date time is a good example of that in the standard library. Date time is immutable. You can create new date times by, say, replacing the day with a new number, but it doesn't actually mutate the date time objects. It returns you a new date time. All the other numbers are the same, but the day is different, so that's a very good approach to follow and consider using what I call named constructors using class methods so that you can actually give you an opportunity to express the domain language, the ubiquitous language in your class when you're creating things. I've also already mentioned in turning with new. So here's a very simple value object. It's an email address. This is quite a naive implementation, but it's simple enough to show on a slide. So I have a class method which is a named constructor, so I want to construct an email address from string. I have a Dunder in it which accepts a local part and a domain part. There are the parts either side of the app symbol. I have a Dunder strunder reper. They're not very interesting. I've implemented Dunder EQ for equality and Dunder hash, which needs to go along with Dunder EQ and I have two properties for readonly access to the local part and the domain part of the email address, and then I have a single method here replace, which allows me to either replace the local part or the domain part or both, and notice that it returns a new email address object so you can kind of fake mutation if you like by always returning new objects. So value objects are very simple to build. In Python. The thing is to focus on Immutability and in fact, most of your model can be implemented in terms of value objects. Usually it's a very good place to start, and the reason immutability is so important is because it makes them so much easier to reason about and test and debug because there's essentially no behavior in them. Beyond some initial validation, let's move on to entity objects, which are the next level of sophistication. So entities are different because they represent things that are distinguishable by identity, even if they are completely equivalent in value. The fact that one of them is a different instance to another one is an important thing. The identity needs to be stable over time. So an object an entity object cannot change its identity and it needs to be unique within the bounded context of that particular model. And for entity's life cycle is important. It's important when the entity is created. It's important how the entity evolves over time, and it's important if the entity is deleted. That's not something we often do or deactivated, which is more common. Say this entity is no longer in use. They're more often mutable. Although immutability is always to be preferred and they tend to be composite, they tend to be made of other entities and value objects. So in Python, it's important to distinguish between creation of an entity, which only happens once for a particular entity and instantiation of the entity as an actual live object. In the system, you might register a user with your system as an entity that registration only ever happens once, but that user might be put into a live Python object millions of times over the lifetime of their interaction with the system. So we need to distinguish between creation and instantiation. Creation needs to happen via a factory function which needs to establish all the invariants all the things that have to always be true about an entity. It probably needs to either create or get hold of a unique ID for the entity from somewhere I've discovered to my cost, you should really strongly prefer what are called factless IDs, not an ID that comes from the domain. Just Cook up an ID somehow it could be an integer. It could be a uuid, a good whatever, but don't make it some fact from the real world because things that you think are immutable in the real world often aren't so prefer an internally generated ID, and often the factory can just be a module script factory function sitting next to the class sitting next to the entity class. Don't do anything. On the other hand, if you use instantiation of the entity that might get called many times over the lifetime of that particular entity, but each time creating a distant instance of that don't do anything. It needs to accept all the ID and all the state it's already validated by the factory at this point, so it tends to be very simple. It's generally just assigning two attributes. Consider maintaining a version, an object version, an entity version within the entity, and possibly also an instance ID. It is very important that Dunder in it always succeeds because you might be calling Dunder in it. When you take an existing entity out of storage and you never want that to fail, it needs to be in a valid state. So I generally use an entity based class. I'm not going to read through this in detail, partly because it's too small on my screen here, but you notice the innings here is very straightforward. It's just assigning two attributes. It's not doing any validation work because we're assuming that validation work has already been done elsewhere in the factory. Nearly everything else that's on here is machinery to do with version keeping and a marker for whether the entity has been discarded and really put beyond use, which is generally a better technique than actually trying to delete the entity. That's an abstract base class, which I tend to carry around something similar like this, an actual entity here, which subclasses entity. This is a customer we call the superclass constructor. We assign an attribute and we have some getter and set of properties. Notice that the queries and mutators check the liveness of the entity, as this entity remarked as discarded, because if so, this needs to fail. Notice that the factory function there on the right register customer. That gives us an opportunity to express the domain language. Again, we have two opportunities to express the domain language. Now, once in the entity name as a noun, in the class name, and once in the name of the factory function. So it isn't just something like make entity or sorry, make customer. We actually can say register customer, because that's what we're actually doing. That's maybe the language that's used in the domain. Okay, so at the most basic level, domain models are constructed from graphs of entities shown in the rectangles here, which own value objects. The immutable value objects may be shared, so the entities are identifiable. They have a life cycle. They're probably mutable, and they're composite, and the values measure describe quantities that are equivalent, interchangeable, self contained, and preferably immutable. So this is very simple, and almost any Python system, whether it's built using DDD or not, will involve a domain model. Something like this. What DDD brings really is what happens next when we get to aggregates. So let's look at aggregates. So here's the previous picture again, what is an aggregate? Well, you might look at this and think, well, it's just a cluster of closely related entities, and that's partly true, but it's much more than object clusters. Although an aggregate may have more than one entity, often aggregates will have only one entity, and there are good reasons to prefer having only one entity. Aggregates really are consistency boundaries. We are going to require that the model is always consistent within an aggregate, but we are going to allow the model to be eventually consistent between aggregates. Okay, so it turns out that in your code aggregates don't really exist. You don't have to write any code for your aggregate. It's really a convention that you follow about how you use your entities. So this is a part where people get a bit confused because we have this very, very important idea, the aggregate, which isn't actually reflected as a class, a separate class necessarily in the code. But this idea of consistency boundaries is really the most important idea here. You'll notice that every aggregate shown in this kind of Orange color has one entity which is special that's called the aggregate root entity, and it is the entity that is responsible for maintaining consistency within the aggregate because it's responsible for maintaining consistency. It is the entity which hosts mutating commands on that aggregate. You can put mutating commands on the child entities within the aggregate. Then it gets more awkward to maintain consistency amongst the other entities within that aggregate. The aggregate is responsible for maintaining its own consistency. So while it's generally fine to put query methods on entities wherever they are in aggregate, I find it works well to only put the mutating command methods on the root aggregate, because then it's in a good position to maintain consistency within. Notice also that the root aggregate is special because it is the target for any inbound references to that aggregate. We are not allowed to have deep references to entities within another aggregate. If we want to refer to a specific sub entity, we must provide enough information in that conceptual reference to go to the root aggregate and then navigate within the aggregate to a particular child. Okay, so that's a very important idea that inter aggregate references are to the root, and all inter aggregate references are by root entity ID. Ideally, this the opaque factor ID that we talked about a moment ago. This is important because then I can instantiate part of my model one aggregate without having to instantiate all the other aggregates that it depends on. Okay, it's fine to have used regular Python references within the aggregate, and it's fine to have the whole aggregate instantiated simultaneously. But between aggregates, we need to use an ID and generally speaking, the factory is responsible for creating a whole aggregate, so it may configure an entity with particular child entities at the time it is created. So the factories facilitate and aggregate creation and allow us to express the ubiquitous language and they hide the construction details, such as entity ID generation. Getting the aggregate boundaries right is really tricky, and you probably won't get it right first time you need to iterate look for conceptual holes, which can be instantiated queried and modified independently of other parts of the model. Look for parts of the model that it is useful to do work on alone. Use the delete rule of therm I'll come onto that in a moment and as I mentioned, already reinforce consistency at all times within an aggregate. But we are going to allow eventual consistency between aggregates, possibly with asynchronous updates. Try to keep aggregates small. Many or perhaps most even should have only one entity. From a Python point of view, I find one module per aggregate is a pretty good way of organizing things. I might have multiple entity classes within that. If it's a more complex aggregate, or if it's a very complex aggregate, I may end up promoting that module to be a package in its own right. That's generally how I find myself organizing the code. As I said already, the command method should be on the aggregate route with only query methods on the child entities. It's okay for methods to accept transient references to other aggregates as regular Python references, but you shouldn't keep hold of those things. It's okay to use them for the purposes for the lifetime of a single method. Call to do some work, but you shouldn't keep hold of those references. Should remember that your models need to be useful and not necessarily realistic. I've seen a lot of time wasted on producing incredible models of reality, which turned out not to be very useful for the actual problem the system is solving. So try to keep things as simple, but as complex as they need to be to solve the problem. So I talked about how finding aggregate boundaries can be tricky. I'm going to show you a quite simple example. If you think of something like Trello canban board software for that, let's think about some entities that are involved here. We have a work item in my model here. I'm going to allow work items to exist independently from the Kanban board on which they are on. We have a board and we have some columns on the board, so we have three entity types here. It's a very simple model, and the board owns some columns and the columns own some work items. Maybe. And thinking about how aggregate boundaries could be here, we could do it this way, but this kind of feels wrong, doesn't it? You wouldn't model it like this because you wouldn't separate the columns from the boards into different aggregates because you need to keep those things consistent with each other. But I've already said that work items can be allowed to exist separately. This might be okay. I could go for this, but I think the right answer if there is one is something like this. The board and the column are separate entities, but they live together within the same aggregate, and work items are their own aggregates because they're allowed to exist independently. The trick here is a simple trick to follow is to use the delete rule of thumb and ask, Well, if I start deleting things, what else necessarily needs to be deleted. So if I delete the whole board, it doesn't make sense for me to keep the columns around, but it might make sense for me to keep the work items around. The delete rule of thumb is quite an effective way of getting at least a first pass stab at where your aggregate boundaries need to be. We would allow our internal reference within the aggregate. There the board to the column to be a regular Python reference, but we'd want the columns to know about the work items by work item ID. And of course, once I've finished on my whiteboard, then I might want to make this more formal, but I wouldn't often make the effort of drawing pictures like this. Any questions and aggregates before I go on. I'm just going to take a couple of questions now because there's more to come. So yes, I can repeat the question. The question was, how prevalent is the use of URL in domain modeling? My answer to that is that, UML, sees hardly any use in any project, including DDD projects. I would say it's well under 10%. Okay, I have this particular diagram because it comes from a training course on a real project. Would I draw this? Probably not. I have worked on projects where we build UML models after we've written the code for documentation, but I would not recommend building models like this, and I certainly wouldn't recommend trying to generate any code from the model. I think you're just wasting your time, so we can still be quite agile with this, and the model will still need to evolve quite quickly, perhaps. And if you're dragging some horrible diagram behind you while you're doing that, you're just costing money. I wouldn't overestimate how long it takes to do some of this stuff. I would go through cycles like this that last a few days. I'm not talking about people doing weeks and weeks and weeks of design here. Small, bounded context, small aggregates. You can work on parts of the system. You can divide the work up you can delegate to people, so you don't need to have some big design effort at the front for this to be a useful technique. Yes. How should I find the borders between micro services? I am going to mention microservices in a moment because I'm going to talk about software architecture now. Can I come back to that shortly? Yes. Okay. So let's talk a little bit about software architecture. Nothing we've seen here requires complicated frameworks or proprietary infrastructure or fancy architectures or enterprise anything. A lot of DDD is associated with a kind of heavy way enterprise computing. It doesn't need to be like that at all. I find it very interesting the way we approach building things in the Python community compared to the way things are built in other communities in which I've also been involved. I think to some extent we tend to build models in spite of some of the frameworks we have to work with. And like I said, DDD is a choice, and sometimes using those frameworks is the right thing to do. But I would also like you to think about the alternatives, and that's really what this talk is about. It's about getting you to think about some of the alternative approaches. So this picture here is from I think not the DVD book itself, but a shortened version of it that came out in 2004. And if you look at what this architecture is about, this layered architecture, we have a user interface, which these days tends to live in the browser. We have an application, we have a domain model, and we have some infrastructure, databases and actual computers, I guess. And then you match that up against, for example, a pyramid web app where we have some JavaScript front end in the browser. We have pyramid. We got your model, which is maybe implemented in terms of something like SQL Alchemy or a Django app where we have the browser and the model. And we use the Django Orm right. Well, probably around ten years ago. Now in the DDD community, everyone was crazy about using object relational mapping technologies to build their models. I was at a conference in London about five years ago, and Eric Evans, the man behind DDD, came out with this wonderful phrase, which I just think is fantastic. The object relational Mapper takes two brilliant ideas, object oriented programming and the relational model and incapacitates them both. And I think this is very true. You end up making such horrible compromises for your object model and your relational model in order to get them to work nicely together. So okay, this is fine. I can stand up here and rant about the problems of using object relational mappers or modeling frameworks. But what's the alternative? Well, people have been talking about the alternatives for a very long time. Bob Martin talks about the clean architecture. Notice, in the middle of this, he has entities and around that use cases, you can think about controllers. And then look what's in the outside blue ring there. We have the UI on the outside, which you would expect. But we also have the database on the outside, which you maybe don't expect. We tend to think of the database as kind of being in the middle underneath everything else. They've moved the database to the outside. Alexa Coburn, another signatory of the Agile Manifesto, proposed the Hexagonal architecture. He talked about ports and adapters. The application is in the center. But again, look at where the database is. It's on the outside. My favorite presentation of this is Jeffrey Palomau's Earn architecture. Again, domain model in the center. Then we have domain services, application services, and the user interface on the outside. But again, infrastructure and the database is also on the outside. So this approach is called externalizing. The infrastructure. It places the model at the center. Your application is not about databases, right? It's about some problem, some domain problem you are solving. That should be the most important thing in your software. The database is just something you need to do. That's just a tool. So the alternative to expressing a model in terms of somebody else's framework over which you have probably little or no control is just to use plain old Python objects. Just write some Python code and own all of it. Right? Don't become beholden to somebody else's technology. If your project is successful, it's going to be around for a long time. It's going to be around for longer than probably many of the technologies you might choose to use. So as a software architect, you should think about that. It's a risk. So mitigate the risk and don't depend on other people's frameworks where you don't need to. So we can have a pure Python domain model, and we can prefer not to build that domain model in terms of persistence frameworks. Because persistence isn't a domain concept. Go and talk to your users about persistence. They won't even know what it means. It doesn't feature in your ubiquitous language. What on Earth is it doing in your domain model? So I've talked about bounded context and a bias towards separate models. Let me just mention how to answer the microservices question. So in terms of microservices, I would draw it like this based on Jeffrey Paloma's diagram from a decade ago and say each of these microservices represents a bounded context. It has itself contained. It has its own model. It doesn't care about the other models. It communicates with messages to other part, other banded contexts. Okay, so they can be written using different technologies. They can be written using the same technology, but each one is optimized for a different part of the greater domain. Does that make sense to answer your question? Yeah. Okay, good. I really am going to get through 100 sides in 60 minutes. Okay, so let's think about repositories. I've said that we need to not build models, or we should consider not building models in terms of persistence frameworks. That doesn't mean you shouldn't think about your context that it's a very large number of use cases where building a crude app using your favorite object relational Mapper is exactly the right thing to do, and I'm not railing against that. I'm just trying to open your eyes to some of the alternatives here, but ultimately we do need to store the data, right? It's not enough just to hope that our computer is turned on and never gets turned off. So we need to have repositories indicated by the red circles here. And a repository is somewhere where we can put an aggregate and go back and get it later. How do we know which aggregate we want? Well, it might be in the simplest way, just by its ID. Maybe it's referred to by another aggregate, and we have these ID references between aggregates and so one aggregate can go to a repository for another aggregate and retrieve it by ID. Or it might be some more complex domain specific query. So repositories store aggregates. They retrieve aggregates. We usually have one repository per aggregate type, and they are an abstraction over the persistence mechanism, and they are very architecturally significant. So generally we want to be able to have an aggregate that we've instantiated put it into a repository, get hold of one based on some criteria, and persist changes to some store, whatever that store is, and possibly also remove aggregates from a repository if they're no longer required. It's very tempting to at this point, get involved in database transactions and things like that. You should resist that temptation. Transactions are something that belongs in the application layer, not in the domain model. The domain model shouldn't be concerned with database transactions. Why? Well, go and ask your users whether database transactions is a thing in their domain and how they talk about that in their domain language, and they'll have no idea what you're talking about, so it doesn't belong in the model. There are many ways of building repositories. We can have what are called collection oriented repositories. In a Python sense. You can think of a repository that behaves like a dictionary where we can retrieve. If you imagine building a class which has an interface that looks like a Python mapping, looks like a dictionary, and we can go and fetch things by ID and how it actually pulls out some storage is an implementation detail, so it's certainly possible to make dictionary like repository interfaces, or we can have a more what's called persistence orientated design where we put an aggregate in we remove an aggregate, and there's a much more explicit notion of saving if you like. With a persistence oriented repository, there are other repository types as well. I'm not going to talk about event sourcing today, but that is certainly an option here. Probably a two popular option. So collection oriented repositories are very easy to use, but they can be quite complex to implement because you're trying to put a kind of dictionary abstraction on top of some machinery that might be quite complex, and they're a good fit if you're using something underneath like SQL alchemy, which can be an intrusive dependency. Persistence oriented repositories are much simpler to implement. They're a good fit with no SQL stores like MongoDB, but they do require some diligence on behalf of the application programmer, because you have to remember to actually save the stuff into the repository. And as I mentioned, there are other options. Generally, we will have an abstract repository inside the domain model. Why? Because the different aggregates need to be able to get hold of each other somehow. So we need to have a repository interface in the domain model. But we defer the implementation of that repository to outside in the infrastructure layer. And because of that relationship, that dependency inversion repositories need to be instantiated even further out in the application layer. Some pictures coming up that makes this more evident. So you need to consider testability. In fact, this is very good for testability because you can build fake or very simple inmemory repositories for testing testing in your domain model. Although not testing your large application. Some advice here. It's very tempting to thank you. It's very tempting to implement generic selector queries, something like having the top right here, like repo employees with and then some Lambda that says age greater than 60. That's really nice to use as a programmer. It seems like. But you're not really expressing the domain language here. And also it's very difficult to implement that in terms of different repository implementations. How do you convert that Lambda to a sequel query? We have technology to do that, but it's probably not a technology you want to necessarily get involved in. It's much better to take the opportunity to express the domain language and have a repository query method, which is something like on the employee's repository, but something like eligible for early retirement, because that's the actual question that's being asked. Okay. So in terms of how I might organize code for something like this, I would generally have multiple packages that have an application package, which is the outer layer. If you like an infrastructure package which contains concrete repositories, and then the package per bounded context, which actually contains the domain model. So they have this kind of dependency relationship. The application depends on the infrastructure, and the infrastructure depends on the domain, not the domain, depending on the infrastructure. So this is the key dependency inversion in the clean architecture or the only one architecture. If you're using something like SQL alchemy, this is probably the exact opposite of what you're doing today. Here's another picture superimposed on the Onion, so I have just under ten minutes left. So let's crack on and talk briefly about domain events. We've just done an entire day of training, right? Without the exercises. So now we're on to day two in the last five minutes. Let's see how we get on domain events and modeling time. So what is a domain event? It's capturing the memory of something interesting which happens in the domain right now. Interesting means interesting to the domain people. Not interesting to software people. Right. So if your disk fills up, that's not a domain event. It's something your logging needs to know about. But it's not a domain event. If a new customer registers with your system, that's a domain event. Okay. So generally speaking, aggregates will emit events. This allows us to model time explicitly. They're significant to the business, the first class members of the domain model. This is something that was really missed back in 2004. At the beginnings of DDD, but DDD has become very event centric recently, and of course it's the foundation of event sourcing and projections and process managers, which I'm not going to talk about today, but you have to have domain events for these things to work, and they allow us to establish causality between things. And of course, if you have them, they are wonderful for logging, although that's not their primary purpose. Okay, so given the time, I'm going to go to this one, so you should name your events in the past tense. So money deposited, not deposit money. There should be immutable value objects themselves. Why should they be immutable? Well, you don't want people going around screwing around and changing the attributes of something that's just happened, right? It needs to be an immutable record. It's good practice to get a timestamp in there somehow monotonic hopefully and include the aggregate route ID of the aggregate that created the event and any information required to navigate to child entities. I generally define my event classes as nested classes within the entity, the root entity, and you may consider organizing them in the same way we do with exception classes so that you can listen to particular events based upon the inheritance hierarchy and prefer a publish subscribe messaging system rather than subject observer. So something like pipub sub within your boundary context of your application is fine. You may want to republish those messages to a message bus for other bounded contexts to listen to and act upon. But within your single bounded context, it's probably just going to be quite a simple Python application, and you can just use very simple technologies for that. I'm not going to talk about that because I'm going to be out of time shortly. I do want to cover this, though. Descriptive events versus prescriptive events. So it's very tempting to write methods which modify the model in some way and then emits an event which describes what's just happened. That's the kind of natural way to write things, and you probably do that a lot. The problem with that is what you actually did and what you just published don't necessarily match those things can get out of sync. One of them gets changed. You forget to update the other one and then your events aren't a reflection of what actually happened. A much better approach is to in your command methods which mutate the model is to describe the mutation you would like to make in an event. Right? And then apply that event as a way of mutating your model and then those things cannot diverge. They have to be the same. So probably last bit of code here. Descriptive events on the left you can see I mutate the underscore name attribute to the new value, and then I publish an event the alternative approach in the name sector. Here I create the event object and then I do self apply event, and that is what actually causes the mutation to happen. That dispatches to the function at the bottom of the screen, which actually does the work. So just to summarize, we've actually in an hour covered quite a big chunk of everything in the DVD book, and it will take you a lot more than an hour to read it. You've had a bit of a flavor of what's in there. So we've covered domain events, entities, value objects, aggregates factories, repositories, delayed architecture. We haven't covered services, and there are lots of other things which sit around aren't on this diagram like process managers. How do we model long running processes that affect the model where the user may come and go during the course of that process occurring? We've done pretty well in an hour. I think I'm just going to close on some references for you to go and look at other things. If your interest has been peaked by what I've said today, there is the Blue Book. Remember what I said earlier. It's not the best starting point. A very lightweight starting point is Vaughan Vernon's book. Domain Driven Design distilled that's about a centimeter thick. The Blue Book is about five or 6 CM thick. If you choose to read only one book, read this one. Sorry. Implementing domain Driven Design by Vaughn Vernon and the Tweet. There has a suggested approach to reading this book, which is to read chapters one to four and then go away and try and build something and come back. I can use the rest of the book as a reference.