Python comes with many standard library packages included without any "pip install"! In this beginners tutorial we will go through a few of these with some interactive challenges during the session. Specifically we will dive into pathlib, datetime, collections, itertools and functools and how these can help you.
Github: https://github.com/simonwardjones/pydata-talk-2022
PUBLICATION PERMISSIONS: PyData provided Coding Tech with the permission to republish PyData talks.
CREDITS: PyData YouTube channel: https://www.youtube.com/c/PyDataTV
Great. Thank you. Welcome to introducing more of the Standard Library. I'm Simon Jones. I'm a senior data scientist at Delivery. I work in the experimentation and measurement team, where I work part partly on experimentation and a platform we've built in Python to calculate sort of a B test, statistics and powers for all the experiments we run at Delivery. And the other part of what I do is working on our customer behavior model, where we try and project out all of our customer behavior with causal inference and statistics. When you say that to give the context in terms of what I do at work, I've been using Python for about five years, and I thought I'd start with a fun fact. Python has actually got nothing to do with snakes, and it comes from Monty Python that is straight from the dock, so you can read it there. To talk a little bit about how Python code is organized. If you were to start a Python interpreter or go into Jupiter, this is very basic, so I'm just covering it quickly. You'll be able to sort of start using Python as a calculator, as I'm sure lots of you have. Or you could create a list and assign it to a variable, and you can think of these as sort of core data types. You can also use the built in functions, things like list ABS. There's a whole collection of them, and we call these the built in. And these all are loaded immediately when you start using Python. And you can use these straight away, but there's also things you have to import. So Python code is always organized into modules, and I can break the things you have to import down into standard library modules. These are things that come with Python, and that's what we're going to be talking about today. And there's also pipe modules and third party packages that you're probably writing in your company. So I think it's good. Before jumping into talking about all the code we're going to go through, just very briefly, why lend the Standard Library? This one is again, a quote from the Docs. The Python Standard Library is very extensive, offering a wide range of facilities, so it's already there. Let's make use of it. Also a point is, don't reinvent the wheel. Your wheel is likely not as good. Here's my wonky wheel. And so in terms of code, if there's a method that's written in the Standard Library, it's probably going to be faster. It's probably going to be more memory efficient than whatever you can kind of come up with yourself. It's just about knowing that it's there in the first place. On this point. I think certainly working in data science. You get a lot of graduates who are really intelligent and they've learnt the list the for loop in the event I'm certainly guilty of this. And you just go ahead and attack every problem with just a really small tool set when if you'd spend a couple of hours learning about some of the libraries we'll go through today. You could have probably done it a little bit better. Coming onto the third point, it saves time, it's less code. And the fourth point is it's standardized for other developers, so they are likely to be more familiar with the standard library than something you can implement yourself. So you can save them time, sort of learning how you've implemented it, then that's going to be good. Also, Python is considered to be batteries included with more than 200 modules. A selection of my favorite here. But what we're going to go over today is path. Lib date, time, collections. It's a tools and funk tools. So I've kind of given us quite a lot to get through, but hopefully you'll understand why these modules are there, what they can do for you, and some of the core API of those modules. So we've got Path Live for representing file system paths Date Time is really in the name. It's for representing it gives you classes for representing dates. And time. Collections gives you a selection of container data types. So the ones we're really familiar with in Python are Lists and Dictionaries and Tuples and Sets. But Collections gives you a few more on top of that to use in specific scenarios. And we'll come to that. ITER Tools helps you with iteration. It gives you building blocks to iterate in a smarter way or build new Iterators from existing Iterators. If you're not familiar with iterators, we'll get that as well. And Funk Tools are tools for higher order functions, and a higher order function is just a function that takes another function as an argument. So it kind of modifies an existing function. Okay, so that's it for the slides. Let's jump into the code in the abstract to this talk. I put a link so you could have either already downloaded the repo or download the repo now, or you can just open the code in Colab directly from this link. And if the Internet is good, you can just run the code directly from Colab, which is a Google Jupyter Notebook version. But I'll start it. I'm going to use JupyterLab. Yeah, just a note before you jump into the code and tell me if that's too small, I might go one more. The modules, I would say, get more advanced, or at least they interact with more advanced concepts of Python as we go. So if you're completely new, hopefully you'll understand the beginning half. And then for the people who are potentially more advanced in Python, maybe they'll learn something a bit more in the second half of the talk. We won't have time to go into every single method in every single of these modules, but I'm trying to give a flavor so you can go away and then know where and what you can use these tools for. There's also going to be an optional exercise at the end of each section. Just keep you on your toes and see if you've understood what I've gone through. So let's start by talking about Path lib. Path lib gives you a class called Path and we can import that by doing from path import path. And a path is a specific location on your computer so you can think of it as a text file here or a configuration file there, or an Excel file and the path to just saying where that is the sort of forward slash on your system. So here we can make a current working directory by creating a path and giving it a string with just a dot as our argument to where the path is. We can have a look at that and it says it's a possex path and it is automatically detecting that we're on a Mac or a Linux or just not Windows really. And so it's giving us a POSIX instead of a Windows. And we can call the absolute method on this instance and we can see exactly where we are on my computer. So this is the path that this is representing. And we can also there's a shortcut method on the path class just called current working directory which is going to give us exactly the same thing. So in terms of building paths, how do you go ahead and build path? There's a few different methods. You can start off with a path instance here with the current working directory and you call the join path method to build up a data folder from the current working directory and then the subfolder student data. You have a theme of student data here, but that's my path or Python has overloaded the divide operator on this class. So you can just do a forward slash and then the string. I quite like it, other people don't like it, but I think it's quite cool way of building a path intuitively or starting from this directory and then adding on subfolders. And we can do the same thing again by just giving that as a string. So there's three different ways of building our path. And now let's say we're going to have a file in this student folder called data jason. And so now we've got a path, we can have a look at that and it's the data jason within that subfolder. So quick bit about the attributes you can use on these paths. We've got the name stem and suffix representing the whole name of the file, the stem and the suffix, the JSON bit. And we can call parts to get all the different parts of the full path of the folders and the file name. When we print it, we'll get back to a nice string representation. Calling absolute is going to give you the full string from the root of the system. This is quite nice. You can have a parent, which is just going to say what's the parent of that directory as a property on that instance. And as it returns another path rather than a string, it comes with all of the methods of a path because this is using the object representation of the class of the path. Sorry. So you can call parent, get a new path and then go to its parent as well. You can use that recursively and we can get back to the data folder. So now we can talk about changing things and manipulating it. So we have our student data path data jason. We can call the with suffix pi to change that from a data jason to a data pi and we can change the whole name using with name method. And again I'm just going to say because it's returning this object, we can chain methods, which is quite nice. Here we're saying with name numbers and then we're saying with suffix. Yeah, here's the point I just want to make. So far, we've not actually changed any files, we've just created representations of a specific place in our system. That doesn't necessarily mean it's there. We've just said this is representing a path at this location. So now we can use some further methods that are actually going to interact and check on our computer. So we can say does this file exist? Is going to go and check and we don't actually have a folder called Data and we don't have a subfolder and we don't have this file in it. There's also two methods is file and it's dire and they're going to check if it's there and check whether it's a file or whether it's a directory. So if it's a directory and it did exist is der would return true, it's file would return false. Hopefully that will make sense. So I'm going to call, I'm just going to see if the student data folder exists. This is the data, student data. So I took the file path, we had found the folder by looking at its parent and then I've checked and that also doesn't exist. So we could go ahead and call this maker method which is going to try and make the path. It's going to fail hopefully. And the reason is because it's trying to make a folder called student data within side of subfolder data, but neither of these exist. So it's complaining that I can't make a folder called student data in a folder data that doesn't exist. So what you can do is you can say parents equals true. So that's going to go ahead and make any parents missing. When you're trying to create this folder and exist is okay, if the folder already exists and you try and create it, it will error and say I already exist. I'm just going to say exist, okay. So if it was already there, we wouldn't get an error. So often you see people sort of checking the file exists or having to make parents. And this makes it much easier. So now we can see that the folder exists, but our data jason doesn't exist. So let's go ahead and make that. So we've got some student data here. We've got John, he's ten and he's not on vacation. And scores. We've got Jane Doe. She's also ten. She's not on vacation either. We've got Isaac Newton, he's 30 and he's got some test scores there. So we can write those into a list of dictionaries. And now we're going to import Jason, another standard library module. I was trying to avoid using other standard library modules in the talk about standard library modules, but you can't really get away with it. So here you can take this path and you can use the right text method. And then I can just dump the JSON into a nice format. And that's going to go ahead and write that JSON file. So we could go in here clicking Data student Data. And there's the data JSON file. Let me just close that. And there's also a method which is called read text, which is just going to read the text from this file location. And I'm just going to print out there so you can see I've made that file. It's just a bit big. So now renaming a file, another common thing you're going to want to do is we're going to first create the location where we want it to go. So here I'm saying my student data path go up to the parent folder and then the parent of that one. So I'm going basically one step up and I'm changing the name to newlocation text. I've not done anything yet. I've just made this theoretical location which is dataforward newlocation. And then this is the line that's actually going to move it. So you're taking this file path and renaming it to this file path and it's going to move that. So I'm just checking if the location where I want to put it doesn't exist. Let's go ahead and write that over. So now you can see this new location exists and we're to go into the student data folder is not in there now because we moved it up into the data folder. Also, if you have any questions at any point yes, great question. In the OS, pretty much every method you've got in Path lip you're going to have in OS. In fact, there's many more in OS, but for just Path. Path Lib provides this object oriented way of viewing a path. So you can use these kind of nice forward slashes and you don't think of it as a string. It's going to return another path rather than a string. But it does make working with it easy because of some of the kind of bits above where you can get the stem. The parents you can call move methods easier and then deleting files to delete a file, you'd use the unlink method and to remove an empty folder you'd call remove directory so I can delete the files. Obviously this one would have existed and the student original location then no longer would have existed. But I'm just checking if they exist and then deleting them. And same with the directories, you have to have an empty directory to delete the directory. I'm just going to make those again because I actually want them. Another really useful method is the ITER. It's going to return all of the files or folders within this folder. So I can show you that as a list, but they're all the different things with inside this folder. And a nice method is Glob which matches a pattern and provides you an iterator through the pattern. So here I just want to find in any subdirectory, so the double star means check any subfolder, any file name as long as it ends JPEG rue me that yeah, and a fairly bad example because I don't have any JPEGs in this example, but we could do jason and hopefully we'll find the data jason. And there's also this random one from Jupiter, so that is a whistlescop tour of Part Live. And as a first exercise, I'm going to give you 1 minute to try and write a function called replace all text with markdown. So I want you to find all the folders in the current working directory and replace TXT with MD. So you want to change them from text to markdown, so see if you can write a function to do that. I'll give you about a minute. For those having a go, give you one more minute, I see how you're getting on, then I'll show you an answer. I'm sure there's many ways we could do this. Okay, I think what I'll do is I'll show you an answer now and hopefully you're almost there. So here I've got replace all text with, I've just scrambled it so that you can see it immediately. But if we go up and paste this into our example here, I'm saying for file path in path dot current working directory. So that's going to get us a path instance of the current working directory and then we're going to call the dot glob method matching any file as long as it ends TXT. And then we're going to rename that to the file path with suffix markdown. So that's like a nice example of and we can test that. We can write a file so we can say path of example TXT, we can write some text into it and it returns nine, which is the number of bytes you've written, I think. And we can call my method I've just written and now I could actually before I unlink it, we can retext which is the method we're going to call to check us there. So you can see this example markdown file now exists because we moved the example text and I can go ahead and unlink that to delete it. So that was part lib. It's a really handy library and I think it's nicer to use than the OS Path type methods. So we're doing okay for time. As I said, it's going to get gradually more advanced topics. So Date Time is a module for supplying classes for manipulating dates and time. Just a really quick note on Time zones, date Time objects are categorized as either aware or naive depending on whether they include a time zone. In general, I tend to just not use time zones, but if you are, then I'd recommend looking at sort of other libraries as well, like Pi Tzad that are going to make it, or Pendulum that make it a bit easier. But the key classes are date time and date time. So Date represents a date Time of time, date Time a combination of the two and the time. Delta is a duration between two dates. Some more notes. Objects are immutable and by that it means that you can't change them once you've made them. And that's useful because you can use them as keys. And dictionaries object of these types are Hashable, which means you can use them as dictionary keys and they support efficient pickling via Pickle module that's straight from the docs. So let's make a date. We're going to call Date Time Date with three arguments year, Month and day 24th April. Yeah, I've named it 24 April. Good name. We can also use the Today method which is going to return a date for today, the 17 June. And we can get the attributes day, Month and year on that Date Time instance. And I'm just building them into a tuple time. We can build 430 by doing 1630 and 0 second zero milliseconds. Now we have 430, that's the time. And we can build a Date time here, the 16 September 830 and 12 seconds. So we can fill out all of those. And again, I'm saying no time zone info. That's a specific order potentially at deliver route, you can get from the Date time back to the date and the time with the Date and Time methods respectively. And similar to the Today method on Date, there's a useful now method to get right now as a date time. You can call the combined method to get from a date and a time back to the date Time and we can print that as a string. So that's going to give a nice string representation by default for the date time. That's what the date time would look like printed and that's what a date would look like. ISO format is a specific standard formatting of strings. And you can also get Today as they have a Isoformat method which is very similar to the print and for the date time it's very similar, but it has a t between the date and the time but you can also go back. So if you have an ISO format string. So here's 24th, we can say date from ISO format to get a date from a string, which is really handy. And we can do that from ISO format for a date time as well, using the date time dot from ISO format. Now, I guess there's loads of different formats when you're working with data for dates. And there's a method called if you just look down here, we've got string from time, and it's going to take a specific string format. And I'm just going to print that out to show you what it looks what now would look like using these different formats. So percentage A or percentage capital A. And you can combine these. So here, percentage A will pick out the day of the week in a three letter abbreviation or percentage capital A is a string of the day. And all of these different kind of, what are they called, directives are available. So here I've got a date that I might have. This is a function for going back the other way. So if you have a string representation of a date time and you know, it's like it's directive kind of format, you can say, okay, take me from a string back to the date time as long as you've got it in this regular format. So here, this A would be the day. Then you have a dash and the date as a number, and you can use these to get back to the daytime. So these can be really handy. And this is kind of a full list. They're also in the Python documentation. Replace is a really handy method that's on both the date object or date time, it's going to replace any of the specific attributes that you specify. So here we can do today replace year is equal to today year minus one. So here we're getting the same day, but one year earlier. No pi data last year, I don't think. And similar with now, this day time, we could replace the hour and make it this morning at 06:00 a.m. Time delta as the duration. So this big time delta here, you can specify in days, seconds, microseconds, minutes, weeks, whatever you want, hours. And so this is 50 days, 8 hours, and it's going to add those all up and it's going to give us it back by default in days, seconds and microseconds. So you can see the sum of all of those different sort of durations. I've now got a time delta representing 64 days, and that many seconds. And you can go back to the total number of seconds by calling that on the delta if you wanted to get a second duration. So I'm just showing you that these two things are exactly the same. Here I've done a simple version of it as 365 days, and this is also another year, but in 40 weeks, 84 days. And I can just show you that those two things are exactly the same. It's just going to build it. It just represents one duration. So that's a kind of pretty short covering of date time. I didn't go into time zones, but I thought it might be a fun challenge to see if you can write a function that takes in two arguments a month and a day. And it's going to return the number of days until your next birthday. So if you're born on September 16, we can call this hopefully month 9 September and day is 16. So again, I'll give you 1 minute and then I'll show you and show you an answer. We want a function that gives us days until our birthday. Probably one more minute. Okay, I think everyone's still typing. I'll go through the answer I've got here so I can unscramble my hidden answer and I'll just quickly explain how I use what we went through above. So first thing I'm going to get today using the dot today method on date, and I'm going to say birthday is equal to today. And then I'm going to replace the month and day as given. So basically I'm just keeping the year from today. But that could be in the past because it could be February 2. So I'm saying if birthday is smaller than or equal today, I'm going to go ahead and add a year to the current year and then I'm going to return birthday minus today and use the dot days on that duration so we can run that and we can see it's 244 days until February 16. Or in my case, what is it? 91. Okay, not quite time to get excited yet. Collections. This is the next one we're going to go. So we've gone through Path Lib and we've gone through date time, which are great, and we use them all the time. For instance, I told you I was working on a customer lifetime value model at delivery. The first thing we do in that Python script at the beginning is we're going to load our configuration file for the model. So we're going to read that from a specific parcel of folder. And then in that we're going to read a date. So we want to predict customers from yesterday who made an order yesterday. So that's an example of the kind of thing that we're using these all the time. So talking about collections, this is a module that implements a selection of specialist container types. So as I said, you're probably familiar with dictionary for a mapping or list for a series of objects or numbers or whatever you want a set and a tuple. So they tend to be covered straight away first thing. And these ones that we're going to go through now, we're not going to go through them all, but we've got name to Tuple, which is very similar to tuple, but each of the elements has a name. So for instance, if you had a record representing a specific thing, you might say the second thing in this record represents someone's age. But you just have to know that if you're using a tuple, whereas if a name tuple in the code you can use sort of instead of looking up that second item, you can say age, which is much easier if you're reading it deck, which is, I think how you pronounce it, but it's a double ended queue. So it's similar to a list. It's just a series of things. But it's useful when you often want to put items on the end of the list or put items at the beginning of the queue or take items off either end so it's performant for both ends. For operating on both ends of the queue, chain map allows you to combine multiple mappings together. So if you had two dictionaries and you want to be able to just look up a value from either one and you don't want to have to create a whole new dictionary and sort of put all the items from both into one dictionary, you can use a chain map and it will just check the first mapping. If it's not there, it's going to go ahead and check. The second mapping account is really good for counting and I'll give you some examples of that. But if you've got an iterable or a list of things and you just want to go through and count all the ones, all the fours, or just count the items in that list, that's what counter is great for. And I'm not going to go through the default dates we will go through, but I think best shown through some examples. So we're going to import counter, deck, default DICT and named tuple and we're going to go through and show you how they work. So here we're going to use the counterclass with a string and ahead of time. I think it's good when you're looking at code, you should try and have a think what's it going to do before you run it. And hopefully you are thinking we're going to get something like this, which is a mapping and counting each of the individual letters within that string. In this instance we've got an iterable of pets or a list of pets and we can call counter on that and it's going to go through and count three dogs sorry, three cats, two dogs and a goldfish. We can actually initialize it straight away from a dictionary if we wanted to, but it's clear to the person reading the code why they're using a counter, that they're counting something. So it also helps if you're reading someone else's code. We could have used the dictionary here, but it just gives you a little bit more intuition. They're definitely counting something. Another useful thing is all missing elements have zero. So if we look at. Any sharks in our pets? Counter, we don't have any sharks, but it's going to return zero rather than throw an error, as a dictionary would. And we can go back to the individual elements with the dot elements method, and a really useful method is the most common. So now that we've counted them, you often want to find the most common. So here we can see that cat and dogs out of a fairly small example, but you can imagine being more useful in the bigger one. What are the two most common pets? This is nice and fast using another standard library called Heat Queue, which I considered going through. But it's a pretty algorithmic library and I thought that would be pretty full on. Not for me. Here I can make another counter with three cats and ten dogs and it would do the intuitive thing and all the kind of operation. So we add those two counters together. We'll get a counter with six cats, twelve dogs still, that one goldfish. I'm just showing you what we had in the original one. We had three cats, two dogs and one goldfish. So this is how you might increment a specific key. So we can say counter, get the dog and add one value. So now we have three dogs, so counter is really useful for counting. Next, we'll go on to double ended queue. So we can take range of five and make a double ended queue and have a look at that. So we've got zero through four, as you'd expect from the range, and we can append five to the far end of the queue and we can append left. This is the kind of method that we don't have in a list you can insert, but it's not as performant. So if you're doing lots of adding onto both ends, specifically integrating a double ended queue type algorithm, then you'd use a double ended queue. So now, if we have a look at it, you're going to have that extra five I put on the right and the extra minus one I put on the left. And we've got the similar methods, extend and extend left. So we're going to put six, seven, eight onto the right hand side of the list and minus two, minus three, minus four. It's interesting to note that they are the other way around. So here it says minus four, minus three, minus two. And you can think of it because it went through adding the minus two first to the left, then it added the minus three and then it added the minus four. So that's why they appear in the other order. We can use the index method to see where's the three so it's in position seven. We can call the dot reverse method, which is going to reverse our queue. I said this is in place and you'll hear that term a lot. It just means that it modifies the deck double ended queue in place, it doesn't create a new one. So now, if I was to print that out, you'll see it's in reverse order. So that's pretty much a double ended queue, similar to a list, but with queue functionality on both ends. You've also got this property called max len, and this really does change it, because now we have a double ended queue, but with maximum length, five. So I'm just going to start it off with zero to four five numbers in it. Now, if I append five and look what we have left. The zero fell off in the end. When I pushed five onto the end, we said maximum length here is five. So it's just going to get rid of that zero and that's gone. And if I append left, put the zero back on the left, then. Now the five is falling off. So this can be great for just keeping track of five things, if you sort of don't mind when they fall off the end. And similar, if I extend five, six, seven, you'll see the zero, the one and the two have fallen off. If I extend left again, they're in reverse order to one and zero, but the five, six and seven have fallen off. And you have a similar method. Conceptually, it's quite easy to understand what rotate two is going to do. It's just going to rotate them all round. So we're not losing any here, we're just moving them all one to the right and when they get to the end, they just loop back around, join the other end of the queue and you can get that attribute on the cluster to see what's my max length. And let's just clear the queue. So now there's nothing in it, just a note. If you do have this situation where you're initializing it from five things with a max length of three, it's just going to put them all in one by one and then so it would have put the one and then the two, then the three, and then it will start knocking them off, as you'd expect with max Lynn Cool default dick. So here's a sentence. Imagine we want to take a sentence and store the word in lists in a dictionary keyed on the letter. Each word starts with that's a sentence and I can print that. And here I'm imagining how I might do this if I don't know about default DICT. I want to have, at the end of the day, I want to have each of the I want a dictionary with letters and then a list of all the words that start with that letter. That's the data structure I want to get to. So I'm saying for wordinsentence split, so this is going to give me each word. If the first letter is not already in my dictionary, I'm going to set that letter to a new list with the word in it. So the first time I find a new letter, I need to create a new list inside the dictionary, otherwise I already know it's a list. So I can just go ahead and append the word to that list. So that's how you might implement this. If you wanted to implement it and you didn't know about default DICT. In default DICT, you specify what to call or what to create when you look up an item and it's not there. So when I call word starting by letter with this letter, if it's empty, it's going to create a new list for me and return that and then I can just append to it immediately. So this is the beauty of default dictator. Basically we'll call this what's called factory, sorry. In my example list, when it looks up a key that isn't there. And similarly this is an example from the official docs here we create a default dictionary where the factory is int. So if we try and look up something from the dictionary or from the default dictionary and it is not there, so that key isn't in it, it's going to add an integer which will start off as zero. So now from Mississippi, I can say for letter in Mississippi, look up that letter and add one. So the reason you can just look up that letter and add one immediately, you know it's not going to cause an arrow when it's empty, it's just going to start it off at zero. So this is another way of getting back to the letter counts. Name tupel. They're great. Here I've got a student. So this is our student data from earlier. Here I might be making a two pool to represent a student. Simon Woodjones, 30, he is on vacation and he's got some quality test scores, but you don't know what they are and it's quite hard to read if you're reading it because you just have to rely on the person knowing which one is which. So maybe I'm writing a simple function to display that students and I want to print student zero. So I'm going to look up their name is students one years old. But it's quite like hard. You have to check which position that specific thing you want to look up is in. So I can call that and it will print my information about the student. But instead let's create a named tuple called student. So the first time this is the name tuple function from collections, you give the name of the name tuple. So this is going to be a student and both of these are the same. You can either give a string with space separated names for each of the items in your name tuple or you can give a list. So here I'm specifying the first thing represents a name, the second is an age and it's just a little bit more readable than a tuple. Otherwise it's very similar to a tuple so now I can create a student in a much more readable way. It's still a tuple and that you can still look up the second item and get that I am on vacation. But you can now use the onmade vacation attribute as well. So it just makes your code a little bit easier. Now, if I just rewrite that same function when I'm putting the name and the age and the test scores into my string, I can reference the dot name and it's going to make it a bit easier to read, easier to edit, but still do you have a question? Sure. I think one thing that's quite nice about a nametuple is you could use it as a key in another dictionary, whereas you couldn't do that with a dictionary is one example. I think name two calls, they're immutable, so you can't change them once you've made them. And that comes with some benefits as well. I think they'll be the key ones that jump to mind. So I can run that function again. And it comes with this underscore replace method. Tuple is immutable, so you can't change something in the tuple once you've made it, but this replace will return a new instance, a new tuple, but with that thing changed out. So here I can change the age to 31. It's going to create a new tuple called old assignment. And you can see with the age one when you're on, so that's sort of convenience. You can look up the fields, the names of those specific items in the tuple, but we can also do the same thing with named tuple from typing. So it's just another way to create exactly the same data structure, but just with a slightly different syntax using type hinting. So this is from the typing library. You can import name tuple and I thought I'd just show you here because it's creating exactly the same thing, but you're specifying the types as well. So now we've got name, age, on vacation and test scores, but with some additional information, that is going to be a string and a ball and a list of integers for the test scores. So again, I can create the same thing and still it's a two point, I can pick out the first. Sorry, I've shown you that. I know, I know why. So earlier, we now going to look at this method called make. So imagine I did have a list of dictionaries. So here's my fake data from earlier. We had our three students with these properties and dictionaries we can have a look at the first one. We can loop through and use the underscore make method on our name tuple, passing it a list or an iterable of the values for each of our students in student data. So there I'm making some name tuples, so I'm just converting dictionaries into name tuples there. Back to your question. Why would you make some tuples instead of a dictionary lease that you could use as keys somewhere else if you wanted to later. So, challenge number three, remembering we've gone through name tuples which are great for giving your points names. We've gone through collections with, sorry, collections, the wholesale, we've gone through double ended queues counter, great for counting and one more but I can't remember it. So the challenge is to create a name tuple called point and it should have an x and a y and it represents a point on a grid. That's what you got to create. And then ideally create a list with 100 points with x values in one, two and three and y values in one, two and three. So just some points on a very small grid, make 100 of those random points and then find the most common point. Now that we've done the simulation, we've picked list of 100 random points. How might you find the most common point in this simulation? Also you might need to import random and use the random dot random between one and three. That's not what we're trying to learn here. So I'll give you 1 minute to have a get this, 30 more seconds and then we'll go through how you might do this. Okay, I'm seeing a few people who look like they've done, so let's go through this one and see how we can use some of what we learn above. So firstly, I'm just going to import random from random as a bit of a tip. So this is one way you could have created a point. You give the name of the name Tuple. So point using a capital letter and then x and y. You could have done a list with strings x and y in it, or you can use the name tuple class from typing and again do x and y. So now we have this nice name tuple representing a point. I can use that like so to generate 100 random points on my grid of one to three. And then I'm going to create a counter using points. And so it's going to go through and count each occurrence of each point. Then I can call most common and get just the first most common element and that's going to be a list with however many. So it's going to have one item in it, the most common. So I get that out of the list and it's a two full with two things, the point itself and account. So here, let's go ahead and show you this step by step in here. Just run this point. There am of points and now we can do counter points. In fact, let's break it up a little bit. Now we can look at the counter and that will look like this. So it's got the counts for each of the random points that we made and then we can call most common on this counter. Let's get the most common point and you can see it's this point and that's the count. And then all I did there was take that out and then I can print it. So that's how you might do that, using counter. And I think it's quite intuitive to a person reading this code. Okay, so we're counting the points and then we're getting the most common one out of it. And I'm sure you could write a different code to do the same thing, but it's very readable using counter. Does anyone have any questions about any of those or should I move on to It's tools question? Only if it's hashable, so that's the kind of rule so it can tell that it's counting the same thing. Cool. So let's go into ITER tools, which I think is the most fun one we're going to go through, and the most useful, I think potentially, at least in the work I do in data science. So just to remind you, we've got our students and we're going to carry on using them just so I don't have to keep creating new pretend concepts. As a quick aside, we can think about what is an iterable and what is an iterator and how does a for loop really work. So this is slight aside and I'll jump back to It tools and the functionality that offers, but I think it's good to sometimes think about conceptually what are these things. So an iterator is an object that implements the Iterator protocol and that means it implements these two special dunder methods or double underscore methods, ITER and next ITER returns the iterator object itself and next returns the next element. So in terms of when you're looping an iterator, you can think, you start at the beginning, it's going to give you the first thing, then you call the method next to move on to the next thing and that's going to enable us to loop through it. And I'll give an example of why that's useful and iterable it just implements the ITER method that returns an Iterable. So don't worry. A little bit lost. I will give an example. Now we have student Data, which is a list and it's an iterable. And we can tell it's an iterable because it has an ITER method and we can iterate over it using the normal force syntax. So we can say for item in student data print item. So you can see I've looped through and I've printed our three student dictionaries out. So hopefully that's familiar. That's our for loop. But we can also call the item method directly on our list and it returns a list iterator. So this is the double underscore. You can achieve exactly the same thing as calling the double underscore method by calling ITER the built in method directly on that list. And that's also going to return a list iterator and this list Iterator. So let's create it. Now, I'm called it student data iterator and it has the special method next. And to get the next element, we can either call double underscore next the special method or you can use the built in next. So if we call that on our iterator, it returns the next thing in the list. So we're right at the beginning, the list iterators that it's going to return John. We call it again, it will return Jane and Isaac. And if we run it one more time, it's going to raise an error and say I'm out of items now. So that's under the hood, that's what's happening. So this is a summary, your pseudo code of what's happening in a for loop. So the first thing the for loop syntax will do is it will call ITER on the thing you're looping over. And now we've got an iterator and we know that this must have the special next method. So A for loop essentially says, well, true, try to get the next item from our iterator. This bit is effectively the block of your for loop and then at the end it's going to accept finally, when it errors, let's just break and return. So this is basically a pseudo implementation of what A for loop is doing. But it's nice to think what's an iterator? It's something that you can call next on. So now let's have some fun with it's a tools. So again, I'm going to try and ask you to think what would happen, what does this represent before I run the example. So here I've got three lists, three iterables and ITER tools. Chain is going to chain them together. So it might not surprise you. If I go through for city in numbers, that's quite a bad name. Let's change that for number in chain print number. So it's going to chain them all together into one long iterable. Why might you want to do this? I'm just kind of adding this on the fly. So you could do this, you could say add maybe I'll cover this more again. But this is going to build a new list. It's going to go and make that place in memory and it's going to build that list. Whereas when we did it this way, using it at all's chain, it's not creating any new list in memory, it's going to go through the first one, go and find the second one in memory and then go through that and then go and find the third one and go through that. So it's much more efficient to use a chain than to build a new list like that. So you could have done that and said all numbers, we could have called this all numbers if I can type and then we could have looped through that instead. But we won't. We'll just use the chain because it's easier to read and understand. I think again, if chain from itribal is very similar, it's just when you have them already in another iterible. So here we've got a list and in that we have lists. So you'll see the difference between chain and chain from iterable is just that if we have a list of lists, then you might want to use chain from iterable. But it's going to do the same thing. It's going to go through each of the sub items. Of each of the sub items. So count is an iterator representing account and you give it an initial value. So here we are starting with three. Good question. I think it probably does. My guess would be it does. I've never tried an infinite chain. We could try that in a second. I guess we'll come back to that. I think I could show you an example of that in a minute if we've got time. It's always risky trying to do something on the fly, but we can use this cycle, which I'm going to show you in a second, and then come back to it. So let's have a look at that. So this account, it starts with account and each time you call it, it's going to give you the next in account. So for n in it's a tool's account start at one step is one. We're going to print what the value of n is. And I'm actually just going to say if we get to five, I hope we get to five, then let's break out and stop. So there I'm printing numbers one to five just to show you another, exactly the same way of doing the same thing. We could say n equals one. So let's just start at one. While n is less than six, print it. If n equals to five, actually you don't even need this. N plus equals one. So it's just a different way of creating an account. Series of numbers. Combinations is probably the one I use the most day to day. So if we've got numbers one to four and you want to create all the combinations of two out of that number, you can call combinations. So here it's going to return a list of tuples with one, two, one and three, one and four and all the combinations. And then you've got combinations with replacement, which is the same thing but replacing. So you can get one and one, whereas in combinations it's only with a different item. Cycle is quite a fun one. Basically, when you get to the end of whatever your iterator is, it will just go back and start at the beginning. So it is an infinite iterator. So we could try it out with chain if we wanted to. So here I've got a function. I'm simulating some game player move. It's going to take a player and if a random number is equal between one and four is one, that player won the game. So we'll return true else just say that the player moved. That's my simulation function. Now I'm going to do the simulation. So I'm going to say for player in ITER tools cycle A and B. If the player move returns true I have, they won, then we'll break out the loop. So if we run that once, you can see it didn't take that long, but if we run this, it's randomly playing a move, but it's going to cycle through our players and that could be really useful in those kind of contexts. Group by. I think it's best seen as an example. If we take Mississippi, it returns two pools, the first item being one of the members of this iterable and the other is all of the occurrences of that item. So if I make it a bit easier, we'd say for key group, and then I can just convert this group into a list so we can see it. So you can see in the first loop around it's the M, and you get just one because there's only one M. Same with the I, but with the S it finds the next. So it groups these two into a run. So it's got SF. So that's where you get the SF. So it's good for kind of grouping runs of items in an Iterable. Yes. I think isolation is another really useful one. So here if you have an iterable, you can specify a starting point, an ending point and a step size. I guess this one I do go into. Again, better or potentially more useful than another way of doing it. So here I'm creating a really long list with a million things, and I've just written a function that's going to tell us how big it is in memory. So if I show you, it's a 7.6 megabyte list, and if I use this syntax, which is one way of slicing it, and it will create a new list. And I'm going to start at the beginning, don't stop until the end. And step every two and it's half as big. So it's not surprising because I picked out half the items of the list, half of the million. However, if we look at the same thing with It tools slice and look at the memory of that, it's 72 bytes, that's completely tiny because when we loop through it, it would just look up the items as you go from the existing iterable. So again, it can be a bit more memory efficient, permutations, really similar to combinations, but order is important. So now you have just the different permutations of the numbers, but with R it tells you how many you want. So I want all the sets of two in all the unique permutations. So I could say I want three. I'll give you all the sets of three from my Iterable and all the different permutations product. If you've got two iterables and you want to find the product of the two. So you have one and four, one and five, that's paying each element in each of the iterables with all of the other elements in the other iterables. And you can also use this repeat argument if you just want to find the product of the list with itself. And the last method I'll go through is the tools. There are a few more, but these are the sort of ones I find most useful. Day to day is zip longest. So zip is a standard built in and if we zipped one and two and A, B and C, they purposely made them different lengths. It kind of dips them together. So it's putting the first item of the first one with the first item of the second list and so on. But because there's no third item, we've just lost the sea. So with it at all ziplocst it will pair them all and where it can't find something, it will just put a none in. That can be quite useful. I also found that when I was writing this talk, I found out that you can call strict equals true mode on the normal zip function and that's just going to error if they're not the same length, which also could be handy if you are expecting them to be the same length and you want them to error. So that was the tools and the challenge here is to find how many times does each number appear in the multiples of three? Less than 1000. So three 6912. The twelve has a one and a two in it. How can you find the cancer? These? And because we're running a tiny bit short on time and I want to go through funk tools and have a little bit of time for questions, I'll just show you the answer to this one. So this is how you might do it. What have I got? So I've got my three times table and I've got a range of 1000 and I'm going to step through in the three times table. So starting at three, no end and taking steps at three. So this is going to be all the numbers, all the three times table, and we convert them into strings. I have now a generator because I've not done square brackets, just to show you that if you're really caring about your you don't want to use any memory here, you want it in the generator form and then I'm going to do chain from iterable. So you can think of this as one iterable and inside it, it has little strings, so we can call chain from iterable and then put that in a counter. So I'm kind of putting everything together as a bit of a complex example here. But we're going to get to the point where we've got each of the different numbers, how many times they accounted, so we can see there's this kind of nice three ish passing going on in the three times table, which makes sense. What was the challenge? To find out how many. Each square. So we've done the challenge and you can go and sort of go back through that and see how we've used kind of the slice, which is really memory efficient, and we've done the chain from each of the separate strings to chain them into one. So you can think it's going through the one. So the three, and then the six, and then the nine and then the one of the twelve and then the two of the twelve. So we're going to step through and all of the numbers like that cool. So that's tough. Does anyone have any questions before we go on to the last topic? And I'm throwing quite a lot of examples, but I thought it was good to give you a flavor of what you can do with the libraries or the modules. So let's import Funk Tools, the last one, and we're going to talk first about the last recent cache and as of Python 3.9, the cash decorators. So let's first talk about this function. It's called factorial. So we want to find the factorial of, say, five is five times four times three times two times one. So one way you could implement that is you say, start with the number and multiply it by the output of the previous factorial if n l zero. So basically, if n is not zero, and if n is zero, return of one. And I'm going to do the same function again. I'm going to call it Fast Factorial, but I'm going to apply Funk Tools Liu cache with no maximum size, and I'll explain a bit more about what this is doing. As functions, a decorator is a function that acts on another function. This is the function it's acting on. It's going to take this one and make a new function. It's going to give it the same name, fast Factorial, but it's going to add a cache as well. So when it calls and it tries to find the factorial of ten, it's going to check first in a little cache it keeps and say, do I know what the factorial of ten is? If so, return it out. Let's start calculating. So that's the idea of a cache, and this is one implemented for you in the standard library. So we can time how long it takes to run the factorial. For 200, it takes 36 microseconds, and then we can do Fast Factorial, 60 nanoseconds. So it's like thousands of times faster. And that's because you can think of it. It's calculating the low number of factorials a lot because it's very recursive. So having a recursive function within our cache can have massive improvements on how fast the function can run. The max size is just how big you want your cache to be by default. I often just want to have no maximum size to just go ahead and use loads of memory. And as a Python 3.9, you can just use cash to do that. So you don't use to specify max size equals none. Cool. So another example I wanted to go through was total ordering. So we're going back to the student example, but now we're going to do a class implementation. So this is how you'd create a basic class version of what we had within a knit method, where we're going to take in the name and age some test scores and on vacation and save those as variable or attributes on the class. We've got a wrapper to make it look nice when we print it in Jupiter notebook. And we've also got this property where we're going to return the mean test score. So we're just summing the test scores and dividing by the length. So that might be a property you want to know about your students. So now we can make using the data from earlier, we're just going to make two students. So we've got one called John Smith with the same data as earlier and one Isaac Newton. And you can see it's using the wrapper that we defined here as normal. If we wanted to do this, if you want to compare John and Newton out of the box with what we've written so far, python is going to error and say that it doesn't know how to compare these two objects, which is fine, but we can tell Python with the special methods. Here how to compare two instances of this class. So you implement this special less than done, less than or equal to, greater than, greater than or equal to an equal. If we were comparing a student with another student, what I actually wanted to do is check whether the mean test scores of the two students are one bigger than the other. So say if I want to compare these two objects and actually what I really want to do is just compare their test score. So if Newton is bigger, using the term bigger might be slightly strange, but it means that they've got a higher test score. That's how you would implement this in a normal setting. And now I can go ahead and compare john is not greater than Isaac Newton, which isn't surprising. And what's happening under the hood when you run a comparison of these two custom instances is it's calling the greater than or equal to methods. So John greater than or equal to Newton and it's going to go and find my implementation and compare their mean test scores. But what total ordering is from Funk Tools library. It's a class decorator, which means we only need to implement two of the comparisons. And it makes sense that you only need two because you can infer all the others from two. If you're not less than and you're not equal to, it means you're greater than. What Python does is it goes and implements in the background all of the other ones that you're missing. So by adding total ordering now I could only do two of them. And you can actually see if you call double question mark in Jupiter, you get some information about that method and you can see that it calls less than. And then it says if not operation result and they're not equal, then it must be greater than. So it's given us this for free and that's the total ordering decorator from Funk Tools. So again, it's slightly more advanced concept of you implementing these instance comparisons. You only need to do two and then let total ordering do all the others. Okay, cool. Yeah, we're nearly there at the end. The other one I think is quite useful is partial. So imagine if we have this method called Is pass and it's taking a student and it's taking a Pass mark, and then it says they have passed, their mean test score is bigger than the past mark, and then we're going to print that test score whether it's above or below the pass mark, and then it's going to return whether they passed. So let's try that. Has John passed? John Smith has test score below 60. And false, he hasn't passed. I'm lucky. And then we can do the same thing with Newton. He has a test score above 60. Good news, he's passed. Now, imagine you want to write a function called is topsapp, which is very similar and it's going to take a student and just return whether they've passed with a 30 test mark. You could implement this like this, where it defers to the is past method and fixes the second argument, is 30. And now I can call that is top sat. So yes, Newton is top set, his mark is above 30. But there's another way you can do this. You can actually create the function on the fly. So you don't need to have the function definition, which means that you can sort of programmatically do this using other lists and objects. But let's say this is the function and I want to fix the Pass mark at 80 and this partial returns a new function called is top SAP. So here I've done the same thing. I've said fix the passmark argument, this function as 80 and give me a new function. So now I've got is top set. It's a different way of implementing his top set. And again, I can do that call that say, Isaac Newton does have a test score above 80. So let's do a different example. We're going to import median from statistics and I can call them min on a list of tuples. And you might know that it looks at the first item in the tuples when comparing these to find the minimum. So one is the minimum of five and two and one. However, you can also pass key to the men and you specify how to look up in each of my items, what to compare against. Say here I take the items and look at the second thing in index on zero. So it's going to compare based on the two, the one and the three. So if I run that, it's going to pull out this as the minimum item because it's comparing. Now the second things in each of the little tuples. So now let's use this and create a partial of the min function. But let's fix the key to look up the student's median test score. So now I'm taking the min function. This partial is going to return us a new function. I'm calling it Min student because I'm fixing the key as this lander. So it's going to look up in each of the items the median test scores. So now that I've got this many students function, I could use that on a selection of students to find our worst performing median student. So that's a way you could, for instance, use partial. I was coming up with a random example. So the last exercise to have a go is to create a function called student pairs to find all combination of two students. Actually, as we've only got a few minutes left, I will show you the answer, this one as well. So here, what have I done? Yeah, this is slightly abstract. Maybe you wouldn't do it this way if you want to do it, but to explain what it's doing, it's taking the combinations function from its tools and pinning R, which is the number of combinations you want to find as two. So now this is a function, student pairs that's going to return the combinations but fixed at just two items. So then I can call that on my students to find all the combinations of students. It's not very many here, but you can see how I found John, Isaac and Jane and Isaac. Cool. So that was all I wanted to go through in these libraries. Yeah, sure, yeah, with my pi and partials. I've not used partial since I've used my pie. So I have not seen that one, I guess. Any other questions? And I can show you just one quick example that I put in the code. So I've written some code here, hopefully you can see and I'll show you what it does first. It's pretty exciting. So we can run the script by going code tic TAC toe. And it's an implementation of tic TAC toe and it says you can under remove, redo, remove, quit, save and quit load a saved game and select number of players. So I'm going to select one player and you can see we've got my tic TAC toe board and I can specify a move. So I've gone in the top left and that's the computers go and I can undo some moves. So undo that computer move. I didn't like that one, have a different one. Then I can go in the middle, let the computer go for the win and have one that's the game. But what I wanted to show you is just a few of the things we've gone through very briefly. So at the top of the file I'm going to specify a data directory. And I use this in quite a lot of scripts. So I'm saying from the current location of the file, let's create a path, find the parent and then add on a data folder. So this is going to save this is a place where I'm going to save games. So that's a thing you might use. I use it if I'm saving features in a model or I'm looking up a configuration file. I've got a training folder, test folder. You can build them using Path live like this. And now I've got a named Tuple. This is going to represent my moves during the game. And the move has a player and where they've gone. So just the number from one to nine, that's an integer and that's a player. So that's how I'm using my name, Tuple. So I just wanted to go down to here. This is slightly contrived example. I admit that we're going to try and calculate the winner of TicTacToe after every single move. And I've actually implemented some separate sub functions called rows, columns and diagonals. And they return the rows of my board, then the columns and the two diagonals. So here when I want to chat who is one, I want to loop through all of these different iterables. And so this group is going to be three long and it's going to be either a row or column or a diagonal as we loop through. And then I'm going to use a counter because we've used the counter and I'm going to count the things in this group of three. And it's either x's for player x, it's either OS for player O or none because there's no one in there. So then I can just check. Okay, let's look up the counter for the x symbol and if that's equal to three, then x is one because they've got three in a row and do the same for O. And we know if there's no x's, it's not going to fail because counter just returns to zero when we try and look up the thing that we've got. So this is quite a lot of a contrived example, but I think it shows you how you can change three things together in a slightly real world setting. That's all I wanted to show you and thanks very much for listening. I'll take any questions if you've got any.