Video details

How to make your Python projects more elegant - Singapore Python User Group


Speaker: Mabel Lee


Nice to see everyone again in physically in a long time. Okay, so today I'm gonna talk about making Titan projects at the Gun. It's also something that I think quite forward, I think Towers will be very concerned about. It also combines my experience with machine learning projects and Titan over the last few years. As Martin has already introduced, I'm a machine learning engineer at Dyson. I work here as a JavaScript full stack staff before and then I first learned Python. I think in 2013. When you like version 2.7 and you see it on and off ever since, one of my favorite languages. Not sure what's more frustrating, JavaScript or Titan sometimes, but still one of my favorite languages. Sorry that I use instead of Jungle for front end, but nowadays I use Stream lead tool. Not that bad. I used to be a Unity 3D game programmer using C Sharp. So today the agent going to talk about all the things that can help your project be more elegant. Elegant in the way that it kind of easier to maintain cleaner and kind of like less frustrating in general. Even though people usually say that you need this only for production, I kind of like to disagree. So I also like to implement this for my personal projects. Especially if you think that your personal project is something you're very passionate about and you want to do your projects for a long period of time. And then once you start adding more and more features, this gets more and more important. So do consider even if you are just doing Python in your personal projects. Yeah, so I'll start with virtual environment. I think anyone who learned Python for the first month, you definitely hear this virtual environment. But just in case, maybe some I forgot, this is a refresher is that without virtual environment, right? Title install dependencies to your site packages in your base installation. And this will lead to a few problems, as we might have found out by if you change your OS, then your computer stopped working, your Linux start working and I need to actually reinstall ubuntu happened to me a few times. So then we slowly realized that really virtual environments are very important. And then eventually you have multiple Python projects and then one is 3.783.9, and then you have random dependencies everywhere. That's when you really need a virtual environment. Definitely. This is a very tough issue even for my team. We debate about this all the time, whether we use vendor if you don't learn Titan through machine learning, you're for both. And actually I even use both together. Later explain how I even can use both together. There are some projects that use requirements TXT mostly like web project example. And the cost is that you cannot really decide the Python version unless you use something with Pym. So if you use pipe amp, you can switch Python version. But even then it's not really together with virtual environment doesn't decide your Python version, it only decides the dependencies. And it kind of like installs packages to the project directory by default. So by default, you need to get the Be and V install to the same directory. And I do find that installation of packages is faster. So you see, even in this table, there's like some PRODA. These three things, I guess, is kind of like approved to Conduct. Okay, but I really like that each Conduct environment can have a different Titan version, which means that I don't need to install typef and try to kind of fight with the Python version. And then it starts with the Conduct directory, which I also feel is more flexible because I have projects that kind of like share environments together. I have models and some model training that actually share contact environments, and it kind of saves time. Also, we use conduct environment. So in Contact environment, usually we use Condo environment across projects where B and B is really like one. B and B usually like that. And yeah, we find that Condo installation of packages is slower. I won't go into the details here because Condo has to retrieve the entire channel. So that's a default channel. Every single package in conduct. So over time, Conduct gets bigger and we keep complaining slower. They've been trying to solve these issues, but I think so far that isn't the best answer as of yet. So it does indeed slower. Quite technical. Over here, you can research how to speed up Conduct installation issues. I just wanted to show the difference between contact and Vs. But whatever you choose, at least you have a virtual environment. Good. You definitely have a virtual environment. I'm not really rooting for any of them. In my current company, we have both another team that uses requirements and my team uses Conduct. And we kind of like hit all our dependencies in peak instead of dependencies, because we came from requirements to everything below peak. And then the best thing about my opinion, I can specify the version and then you will specify the channel is like, where the packages come from. Is it conduct channel. You also create them and activate them in different ways. Not trying to say that VMB is bad just because the tree over here just an example of how you activate the BNB and Conduct. This Manline skin is quite nice. Ryan love this skin. I think it's called starship. Anyway, it's good because it actually shows you the environment name. So that's very important to know what environment being activated right now. It's not just for sure, not just like colors, but the most important thing is that you know what environment you're in right now at all times. That's the most important day away. And I also sometimes create conduct environment specified. Only python. So I find in Conduct very easy for me to do that. I literally just conduct create request environment and I specify the Python version. That's something I really like to do if I'm starting a new project from scratch. I might not even have this yet. Might not even have the environment. YAML. And then I like to use Conduct to make sure my version is correct. This is how we kind of do it. So after creating the condo environment, we activate it and then install requirements inside the condo environment. So in this case, your project can still have requirements object C but you install it to a Conduct environment of a specific item function. I guess you can mix it together. Not really fun commonly online, but this is one of my favorite tricks. I use Vs code a lot also because I used to be a web developer so I haven't switched to Pi chub. But anyway, because I use Vs code by examples, in Vs code we can switch the Conduct environment and the VNV quite easily. So even if you have EMV, you can switch between the Conduct environment. Also, it's all according to how you choose the interpreter. If I'm not from PY chunk, that hit a bit automatically for you. I think Pychong is a bit more user friendly in the week, but also finding more flexible because I can't switch any time. Okay, so I think everyone knows it's important already they were talking about one of my favorite topics is like testing. As I mentioned, testing usually people either very happy, but for testing, really after you find a few bigger projects, you'll find that it's very hard to do anything without having proper tests in the long run. In the long run, if your project lasts for one week, okay, fine. You cannot write tests. You can delete the project after one week. But really, even if you start the project very fast and you have no test, you can iterate very fast. You close a lot of zero tickets but then eventually you'll find that you have more and more utilities. Your project order comes with a lot of project request. So as you have more ad hoc request and scripts, you start to think like what feature do I have before? And that feature break my old feature break my team and also me, we realized that we really need to have the test coverage going. So let's talk about some testing. So some benefits of testing which you might already know, but having comprehensive test makes it easier to refactor because without testify, really have test that can test the important features, at least important features of your code, right? You know that there's no breaking change if all the test parts you shift the classes here and the test, you can be somewhat, maybe not 100% sure, but somewhat sure that your important features are not breaking unless you have 100% test coverage. But even you have 100% test coverage, it depends on the quality of the test. Because there's a lot of different types of distinguish our elaborate further. Oh yeah, I also want to mention that test serves as good documentation. So I know a lot of people like to write dodge strings and red, right? Many dodge strings but that kind of defeats the purpose of a job in some way. I thought we were trying to move away from documentation and then now you want me to add a middle line to my top string. So I was thinking like test value software documentation. The reason why is because we very good test. You know what important functions there are to a script. We don't test every function in the script, we test an important function. So if you see a function being tested right, you know that's kind of like an important function and you will know the parameters, what is the expected output and that's really does easier understanding to the quote and somewhat of a documentation. This is one of my favorite videos when I first joined this was one of the first videos one of my friends sent me. One of my colleagues told me that it's really like if you don't do integration testing, right, it doesn't matter your product doesn't work no matter how much unit testing, how much you want, whatever you need to test the full blown kind of like kind of testing which we will talk about later the permit but let's talk about the unit testing. Integration testing, right? That's like automated test. When you thought of manual testing in my mind it's kind of like this when you are manual testing every part of your program it really becomes like network. You need to really change each parameter and make sure that your solution is correct. Kind of give me the impression I want to mention a bit about the testing permit. It's not a testing talk but you can use this to kind of reflect on your own project. How many unit tests do you have? More and more unit testing. You should have more unit testing so that you kind of cover the basic functionality of the function itself just as a unit. But then like I said, if you just unit test and then integration test if it fails right, the product still doesn't work. So you still need integration test, triangle test both parts are working well together. That's the meaning of integration. Like different components of the project is working well together and end to end test are really expensive. They are mostly like from the start test program from the start to the end. Really like human clicking on your program and how they interact with your program. End to end is more like a human data but just automated and we should have less of that because it's expensive in terms of slow. It's also easy to fail such tests because usually you need to wait and wait for a while for the app to re add and Then Test It and Wait For The App To Re Add. So that's kind of like the most expensive test. End to end. Sorry. Yeah. So after rambling so much about the benefits of test, we need to find out how can we actually do proper testing in Python? Definitely most people will point towards the unit test package in vitamin. So it's like the unit test package. And that's what most people start with the inbuilt one. But do they like the government like trying out pipest? Pipest is like kind of an improvement of unit test. Unit test is in built, right? So if there's a package outside of Pipe, then it must be better. Why are people using it? So we need to actually purposely install tightest. But it's really worth the effort, in my opinion. It was built to address the shortcomings of unit test. And there's a lot of features. Even up to due date. I'm still learning new things about Pi test everyday. And the best thing is that you can run unit test test. So if your team refuses to change the test, it's fine. You still can run the unit test. Yeah, that's the cool part about type test one. Go through this very detail. Doesn't need to be always in the class. So I think unit test is always in the class, and they can be in a class, but it's optional. And for pipe test. You can use mocker for mocking dependency later. I'll show an example. I think examples are better. Okay, so what I meant by Titan user standard assert means that it doesn't have to call it doesn't have to import special function. It doesn't have to call special functions. It's actually using the assert keyword. Right? From Python itself. This belongs to Python library is the assert keyword. So I assert that this function is returning for and I think that this returning fault. I guess it's clear to the reader that this will fail. So this is an example of a test that will fail in pipe test. So it really fails. It really fails. And after it fails, we see this very colorful output where we are not the same. So that's why that's like plus three, minus four. Minus four is on the right side and trees on the left side. So this is the output that is giving us helpful output for you to kind of find out why your test is failing. You can see the exact average occurred. And there's a lot of other information over here. Go for debugging. Not important. So no for this. Okay, so for unit test, the classic unit test, I'm not sure how many of your choice, but if tight test looks like this, right? Which is kind of like cleaner. The classic unit test looks like this, where you are forced to be in the class and you inherited from test case because you are forced to be in a class in Harry from test kids. And then you use the assert equal. And Assert equal comes from the fact that you inherit from unit test. Then you need to use the assert unit test. So that's very different from Pytech which directly use the Python unit test. You need to kind of go and search the documentation. There's a lot of other type of assert methods, so you need to be very familiar with the different assert methods. But then it's like a bit harder to read. I would say I'm quite biased already. So just ignore my bias. If your last unit test just go ahead, at least you have test and that's my point of the talk. Yes. So once the show more differences between tight test and unit test so I created this. We always use full for our example function. I did not return budget types by return get one, get three, get four, which they return 12345. These are the functions and the four is the function that I want to test. These are the imports. Whether we should mock these imports or not, it's a different topic for a different talk. But just assume that I definitely want to mock these dependencies over here. Okay, so this example is from unit test. Unit test example is when you need the cash, you just use the tax decorator to mark every single dependency previously. So remember that we from illegal elegant utility import, get one, get two, get triggered for all the way to five. So because I want to mock these dependencies, it means I don't use the real functions, I mop them out. And in unit test looks like this. The decorator lines on top of the when you define a function, these are the decorators and you can specify the return value of this object. Unit test usually looks like this. And if you just mentioned that Pyte has more cost because for Pytech you can also use this. You can also use this for Pi test. But one benefit of Pi test I would say is that there's a plug in which is I'm going to change the slide. Okay, then you can plug in, but you must install this pipe. But just imagine it as a parameter given to you. So this dot in gives you this mocker kind of object. And so with this mocker, you suddenly can patch inside the function like that using mocker patch. And I guess it's less scary reason why is because for every single decorator that you use on top of the function, you need to have a corresponding parameter. Your parameters need to match your decorator and even then I get the order wrong. Usually it's 12345, but when we test it's like 54321 or upwards. So even then I always get the patching wrong and the parameters are kind of mixed up. And sometimes the only way I can design the object just to check what is actually mocking because the order methods for this kind of using the mock package, but now we're using then we don't need to bother about passing as arguments to the test and we can use the logarge. This does the exact same thing as the previous slide. So because I set the return value to zero, remember, I promise you that, I promise you that every one of these functions, they return one to three, four, five. I promise you that. However, after the packaging, because I made everything return zero, I think talk a lot about testing and my favorite package was tight test, but now we have Linting. For Linting, I don't have a favorite package. So I just thought about Linting. In general, we definitely always refer to these enhancement proposals in Titan. So in Titan we have thousands and thousands of Titan enhancements, usually last time proposed by Griddle, the creditor, this is one of the famous ones, you might have heard it before because it used to be a name of the package that does the leasing, but then they rename it actually refers to the south convention that they recommend you to follow when you are writing Python code. We always say that programming we have a type versus debate, right? Well, they have an answer to a lot of things that we might argue about at work, but after you see Pep Eight, you will stop all arguments at work. We need to stop arguing in the pull request because the stock guide says that we can follow a stock guide. So that says sometimes when you are reviewing pool request can just refer your colleagues to this. And so I'm just talking about split it, because for my project at work, I usually use Flick it, but I know there's a lot of other stuff out there. So even if you go to Vs code, right, you charge more of a security thing, but never mind, it combines three different kind of useful packages together. They combine like Pi six, Pi code style. And although you need to make sure that you need to install it to your virtual environment first, and then make sure that the interpreter just so I mentioned the interpreter over here. So in Vs code, if you sell the interpreter, you select the environment you are using. So if your virtual environment has stated insights, then Vs code will automatically report it over to you. And that's true for almost all the lenders out there, not just because it combines three useful packages. Quite irritating, and I always don't know which variables are not used. And there's a lot of random imports, especially when you call PECO from Jupiter notebook, right? And then we find there's a million Jupiter notebook imports that we never used before. So the price takes me in Vs code to remove such imports. And then this one talks about complexity, but honestly, I never use it much. Inflate we usually use it as possible analysis, checker, analysis checker. But anyway, this one kind of checks for the complexity of your quote and I don't think it's enabled by default inflicted. So you just need to enable it yourself. Okay, so I have an example of a very bad item code. There's unused import, unused arguments, bad function naming and needed other errors. So let's see what happens when we go to the interview. If you see this in your project. So once you activate the advertiser has flipped it and you go to VF code, you can select flip it as your Linter and then your warning, which is like mostly complaining about hot space. In Fort, variable is not used. That kind of like complaints. Yeah, it's usually things that we complain in a pool request, but after you can stop arguing over on the pull request. So it's very easy to run it just like flick it means that you flick it. As I project, obviously there are ways to ignore some files, but I will not go into the details over here. Okay. I think the biggest competitor to flick it now is probably Pilean, I think. So for Pile, that's actually more checks compared to Fix it. Pilot will even check for naming style. Whether you confirm to a certain naming style, you will check whether you have a dark spring or not. And also even our favorite snake case in Python, they will check for it and they even complain about the final new line. Meaning you can also consider Piling if you want to be stricter, like a more kind of production ready project, you can consider Pylon, but I think it's very strict and a lot of things to follow. So it's quite good. Also, as a starter, I think for lifting okay, then we have lifting is like we check all the inside caring mother, right? They will just tell you that you have this error. You have this error, but they don't force you to change. You understand why I'm getting formatting is like a very angry bugger that forces you to change your code immediately. Formatted, you just control S and your quote is forced to change and confirm to a certain standard mother. Probably the first person that calls it strict Asian model. But anyway, in Titan, I usually use Black recently recently been using Black. So the name Black comes from Fort because Henry Ford used to say this. I think one of his meetings, he said that the car can be any color as long as Black, saying you just need to be black. Right? So Black had this philosophy you don't need to configure everything, just let Black handle it for you. Like your strict mother that doesn't want you to go out. That kind of feeling. So it declares I think one of the basis in Bran says uncompromising, it tells you straight head. So you agree to give it control, but then it says that you save time and energy for important methods. I really don't believe in this because I don't want to hand format my quote. I don't want to add text by myself. I don't want to argue with the force based on tax. Again, I don't want to have to kind of decide whether there's two lines in between each function. No, that kind of nitty gritty things. So I really let black decide this kind of nitty gritty decisions. Okay, then you sound dedication to my colleague here. He does feel that black is a bit too strict because black has a max line length of 88 characters. So 88 characters, it depends on the project you are working on. So for machine learning project, a lot of the variable names are really very long, and I guess the quote kind of complex to begin with, a lot of loops and a lot of indentation. So we kind of decided to get more leeway for max line length, and my team found it to look a lot nicer for our quote. I mean, it depends on the team. Actually, the documentation for black recommends around 90 experiment with the max line length. And I think mostly this is basically the main thing that you should change in black and everything else just leave black to decide. It's not it defeats the point of using black. So if you really don't want this stricter black, right, you can consider other auto formatters or so. Definitely the most recent one is Yas. I think Ya PF, which is developed by Google, I think it's gaining popularity or so is the total opposite of black is very configurable. So I guess people did get fed up with being too strict on them. You can consider this, I think it's called we have YAML. And this is another titan format, which is kind of true. It's one of the latest formatters in the market. And even though it's developed by Google, right, they do have a few different styles. I believe they have Google and Facebook styles. And Facebook is quite close to black based on my investigation. So you can achieve articles which compare the format for you if you want to evaluate which one is better for your project. Auto Pete, I don't really want to talk much about it because it's really just confirmed to pack it just automatically try to format your code to it, but nothing extra. This one. The last part of this talk, we talked about type team. So Python is traditionally language that doesn't really focus on long and type. In fact, when I first learned Python, I didn't even know the different types of variables. Sometimes when we learned Python, it's like that. We just go along with the flow. But then, of course, project in production project that you want to maintain for a long time. After a while, you might forget what is the type of variable and stuff. And so this type hint is a new feature in python relatively new feature in Python, which kind of can hint to you the type of variable. But I must say that it won't tell you that the typing is wrong unless you use a static analysis checker. Like my Pi is like one of the static analyzers that actually check for the check whether your type heat is correct or not. But normally you wouldn't know whether your type in is correct. Something like a dog string but yet still readable by static analyzer. Yes. So I think even the developers of Python they are quite worried about before we start to worry whether we need to keep putting types everywhere in Python, they emphasize that Python is still a dynamically typed language, so they have no desire to make it mandatory. Sorry, but another benefit of having types in your parameters and with the function returns and such is the IDE. And I don't know if you just go this also when you're trying to call that function and you pass the wrong type, that's the parameter. Yeah, right. It will highlight it for you and tell you oh, you're sending the wrong. Type over there because you're right. So it really gives back control of the IDE. Right. Because for the idea right. You cannot guess what type you are trying to create with the variable name. And so giving the type in IDE, how can I help you? By just give an example. If I assign the type in PD data frame, a Pandas data frame to a certain variable, then when I take the variable name and I add a dot right. The ID will give all the suggestions for the data frame object. You will give the suggestions of the method. Yeah, after we know it's very beneficial even for the ID and static analysis and also for yourself, I just want to say like they look quite different across the Python version. So before 3.9, I think Python is three point eleven better not normally three point ten, but before I think from 3.7 we will import topple list and dictionary import the typing like that from the typing library. So everything we need to import from typing library. Which reminds me a bit of TypeScript, kind of like TypeScript but then after a while, after all these kind of get annoying, right? Why do I need to import something specially for my types? They allow you to use the lower case version. Lower case version actually if you remember constructor so if you use the lower case version, I can create a new tuple. The third constructor you can create a new flow. So actually these are kind of like normal functions that we use to call but now you can use them in tightens but in 3.7, right. You still have to form future import annotation and let out elaborate what the real import is. But there is a way to kind of like use the lower case version and you. Don't need to import from typing library or you do this, which you also need to import annotations from future. So no good way around it in 3.7. If you don't do that from future, import annotations in 3.7, you will actually get a real error. So when you try to run the quote, I help everyone and I ran the quote and so I get this error. Type object is not subscriptible. Very mysterious sounding error. It's just saying that because it used to be a constructor where we can create new supports. So we are not supposed to subscribe. It means in a list we can subscribe and say we want the devout index of a list using the square bracket. So subscribing actually using this square bracket using a lease because they are actually running the type hens in this case the type hints are actually getting run as code. But just know that this one the error because they really ran the type in as a quote. Okay, so in 3.9, finally we don't need to import that thing from the future that import annotations from future and we can use this for officially officially in 2.9. And so if you click the typing documentation in Python changing the version from 2.72.8 to 2.9, right, all the examples will change for real flight, they are flipping from one version to another and that's because really different titan versions will look different from these type hints. And I guess that makes a bit unpopular because before I'm confused by it, right? That's a bit set, you feel affected, right? Yeah. Oh, no. Super easy once you get it. But I wanted to ask if anybody has used high pins in 310 and 311. I guess it supports generics. Now, in Java you can have like. Class hierarchy type, he has multiple types. There's a problem where he has this huge list of types that could be a parameter or return value. But now you can use generic like a short statement that will include everything. Yeah, I remember the generic from Java. So you can set the parameter as key and it can be any type generic type. Yeah, you could set boundary. It doesn't mean it has to be every type, it's more like a range of type kind of thing. I'm almost done of conclusion, sorry. Yeah, almost done. So that's not why we need to import this future, right? It's because they were evaluating this thing as quote in 3.17 can be treated like springs because they shouldn't be quote that we run. That's actually the meaning of importing annotation from future, right, means that we are actually converting the type hints to let Python treat it like a string so it doesn't run as quote. And so it's actually mentioned in this p like five, six, three and normally when you create a class and you try to reference the type in of the same class vector once the return vector object, right, and so you really don't want Python to run this type as quote, because then Python will complain. It's not defined because it's kind of recursive kind of definition vector to define vector, it wants to return vector, and vector has not been filed. These computer import annotations help with that by treating the types as a string. And you can do this? You will stop trying to evaluate whether vector exceeds or not. Yeah. Okay. The conclusion is that I have gone through a lot of the tools that I use in my daily life with Python for the past few years, and I have only scratched the surface, and maybe some of it is a bit too much for your day, but I hope that you will get some inspiration and you can read about it online. And there's also one more thing I don't mention that everything that I thought about here, right. It cannot be properly implemented in a project unless you have kind of like a continuous integration pipeline. Always manually run it yourself before every commit, and that's like torture. So you should always run it before you commit, but also in case you never run it or in case the way you run it on your environment is wrong. So there is still this CI pipeline to catch for such cases, and this contains for another talk. Thank you. Bye. All right, thank you very much. I think we don't have enough time for questions. There will be thousands of questions because we showed so many different tools. Right? So if I would say, set up your laptop and let's just jump right into the next door.