Video details

Python

11.28.2021

Presented by WWCode Connect Forward 2021 Speakers: Karen Wong and Chethana Gopinath Moderator: Stephanie Rideout

Are you looking for a place to practice your code interview with a group of Python enthusiasts? Join us for our LeetCode study group to learn solving algorithm patterns problems! We will talk through intermediate/medium level problems on LeetCode by recognizing patterns in them and solving them based on those patterns. This session will focus on understanding the power of hash tables and how one can leverage them to solve a problem. We will cover what a hashtable is, how to use it in a problem, and most importantly - why and what kind of problems can we use them in.

___ đź’» WWCode Digital Events: https://www.womenwhocode.com/eventsâ€‹ đź”Ž Job Board: https://www.womenwhocode.com/jobsâ€‹â€‹â€‹â€‹ đź’Ś Make a Donation: https://www.womenwhocode.com/donate

This section. I will do a little bit introduction about the track and ourself and also a little bit about the study group. So my name is Karen. I am a programmer at Research and Development Center. I'm located in Hong Kong and we also have Channa who will be their speaker today. Channa, would you like to introduce yourself a bit? Hi everybody. Thank you so much for joining. My name is Chitna and I am an associate software engineer at Realtor Dot. Com located in Austin, Texas. Thank you, Shanna. And we also have Somia from India. She is a final year of students to do computer science. She and Stephanie have been the most amazing people to help us to get through this session. Prepping everything. So now I'm going to use the chance to do a big shower for both of them. So a little bit about women. Nicole Python Track in case this is the first time you join in. We are a group of Python enthusiasts who shinnovation Python and share their passion to help the community grow. We offer a lot of we offer to Steve study group and workshop hearing levels from beginner to a very top specific event. And by the way, we are very fun group. As you can see, we are definitely having fun today as usual. So for any future events, please check out the website WWCO. Compython. We also post all our awesome resources and materials on the GitHub link as shown on the slide. So a little bit about our study groups. So we call ourselves a Leeko Study Group. Lee Co. Is a website where it provides a lot of awesome technical questions for soft engineers, even programmers like me to practice and get experience because some of those questions that actually would be asked on an interview where if you want to join in those big corporate like Microsoft and Google. So the reason we created the group because we want to solve a medium pop like we want to solve a medium level everyone problem together because it's not as easy to do, especially by yourself. And you go through the struggles and everything. It could be very discouraging. So for group, the goal of each session is to recognize talents of a specific problem and figure a strategy to solve it together. So the study group happens every other Thursday and they're women in Python track and it will be at 08:00 p.m. Eastern time. So in case you're interested in joining in, feel free to join the front group. We are interactive section. I always learn a lot from China and Sonia, so feel free to join us. All right, now we going to start for today's section and I will head it into Shannon, the speaker. Awesome. Thank you so much, Karen. Welcome everybody to today's session. And if you're joining for the first time, welcome. Thank you so much for being here today. We're going to be talking about the power of hash tables and how important they are when it comes to problem solving. So without talking too much, let's get into the session. Today's agenda is a brief discussion or a brief overview about hash tables, Python, dictionaries, and Python sets. We're going to be focusing on dictionaries more, but we will do a quick overview of all three of them, and after that we will look at group anagrams. That's the problem we're going to solve within the code today. We're going to start with some discussions and then look at some test cases and talk about different approaches and time complexities and space complexities for those approaches. And we'll head on to a little bit of coding. And after that we'll talk about what base on what knowledge or anything we've gained from this session. What are the next steps we can work on? What other problems can we tackle? So that's what we'll talk next. And then we'll end it with a little bit of Q and A. Also Disclaimer. I wave my hands a lot. I'm so sorry in advance. Okay, let's get right in before we get into what a hash table is and all of that fun stuff. Let's talk about this problem at hand. From this snippet. What we can see here is we have a people array, an array that consists of tuples. For each tuple is a person detail like a person. So like, the first element of the tuple is the ID of that person, and the second element is the name of that person. As you can see, I'm Super innovative with naming people. So P one and PX P of ID, whatever you want to say. So ID and then name. So we're have it organized like this. The dot is for we just have, like, nine or something here. We can have millions. Right? So, yeah, that's what that indicates. Now that we've seen the structure of this. Right. How do we access a single person's name? Maybe based on their ID? Maybe. Suppose I want to get the name of the person whose ID is five. You could do that with the current way we've organized our data. We could do it in a single for loop. We could go over each tuple and check the IDs, see if anything pertains to see if anything is five. If it's five, then we return. So we'll do 1439 and then we'll hit five. And then once we hit five, we know that this is the idea that we want. So we'll just return that. So that's what the snippet is. We can also see that you can't really do it's. Not really sorted in terms of anything here. Like it's not sorted in terms of ID. There's no sort of ordering over here. So you do have to go over every single entry. Fine. This looks fine for what we have. What about like in the worst case, if we're trying to find something, what would the time complexity be for this in the worst case possible. Right. We'd have to go over the entire array, so that would be off. N like a whole path. Great. I do like our initial method of doing this. It seems super ambitious, but there are many other things that can change here. Like we said, we have millions of records here. So our time complexity each time. In the worst case, it's going to be off. And it would be like a full pass over the entire array. And what if we have? We can just have an ID and a person name in the way I've written it in the real world. We're going to have many more attributes to describe something. What if we have a lot of attributes? What do we want to get? Like, I don't know. Like, maybe the age of a person with ID equals six. You see where I'm going with this. Right. So in that case, this is not really going to perform well for this particular purpose. It's fine, but I'm trying to do a smooth transition here. Can we do better with how we organize our data? So drop in the chat like, I didn't see the chat so far. So drop in the chat and let me know how we can store the data. Like the people details data. Yes. Awesome. Because we're talking in terms of Python. We could use a hash table or in terms of Python, a Python dictionary. Great. Now let's revisit the same problem from before. This looks super like just because we're comparing and we're taking in just transforming what we had before into what we want now. So we have now a People details dictionary with ID as key and the name as value. So the ID, it has to be unique. You cannot have duplicates the key has to be unique. It cannot have duplicate values. And now accessing a single person's name based on the ID is a whole lot better. So we could just do people of the name of the dictionary. People of that ID, it would look up for us. And the lookup time or the search time is constant. We'll discuss how it may be in the later slides, but it literally transformed our game for us. From that to this. Now, I think we're ready to learn a little bit more about hash tables. Okay. So what we saw there that was a Python dictionary, Python dictionary and Python set. They're internally implemented using a hash table. So we should learn what hash tables are because those two are in those video structures within Python. We just use them. But sometimes it's good to know the basics before we get into it. In case you're feeling adventurous and want to try to implement a hash table. I don't know. Just saying. So let's talk about hash tables specific three basic elements here. So you will have the key value pairs that you want and then you'll have a hash function and then you'll have an array. This is weird, because anytime we think of, like, a dictionary or something like that, we just think, okay. Like immediately when you think of a dictionary, we think of like the curly braces and that's it. We don't think beyond that. But this is very interesting to know. So the key value pairs are going to be put into the array that you see on this end. And how do we know where to put each of that information? Like, for example, here we have name and age. Right. So we take that pair of information. And where do we put it in the array? What decides that that is a hash function. So basically the hash function, it would take in your key and it would return an index. And this index is where you put your key value pair from before in this case and how we write the hash function. That's a whole different session in itself. So we're just touching on a few important things here so that you can explore on your own as well. Okay. So hash off name, give index grip. So we have P one and P two here very small data set. Right. But for demonstration purposes, let's go with this. So hash of P one gives you the index one. Great. So P one and 50 is the age they're stored here. That information is stored in the array at the first index, and then hash of P two is stored at the zero index. Great. Basically, this forms the crux of a hash table. Now this looks super rosy and beautiful. Right. You just like you have two indices. And then there's nothing weird happening here where both those indices map to the sorry. Both of those, like both of those key value pairs, they map to the same index or something like that within the array, the hash function cannot always give you indices where you always don't end up in that situation where it gives you the same index for two or more records or key value pairs. And off of the build up, I'm going to talk about it now collisions. So this is super important to know, because real world data is really messy and you don't really have it's, not about the data. It's about the hatch function here in general. So how do you know if your hash function is not going to give you the same index for multiple entries? So let's talk about this. In this case, you have P one, P two, P four and P three. These are the names and then the corresponding ages. And let's look at what the hash function gives out in this case. So hash of P one spits out one. Great, no collision or anything there. And then hash of P two out zero and one. Great. We only have, like, two from the previous slide. We only have two spaces in our array. Okay. So we come to P four. So P four has an index of one. It gives you the index one, but P one and its data is already in that index one. Okay. We'll talk about this and then same with P three. Right. So it's like P one and P four and P two and P three. Okay. So now we see that there are collisions here in this hash table. How do we resolve the collision? A couple of methods. This is one of the methods to do it. It's called chaining. Basically, the collision that you saw before it is sort of fixed by chaining them together as a link list. What I mean by that? So you initially start off with P 150 and that would be within the zero index in this array and P two and 21st index in the array. Next we come to P four and 60, and that also has an index of zero. What we do here is we have a pointer from the zero index to a linked list where the head of that linked list is going to be the most recent collision that we saw. So what's the most recent collision for index equals zero, which is P 460. That value just refresh P 460. Right. And then the same thing goes for P three and 30, because P three and 30. Basically, what I'm trying to say here is if you have multiple collisions or anything like that, you chain them together and the most recent collision will be on, and that would be the head of that link list. Okay. So this is fine. But there are some disadvantages to this. What do you think are some of the disadvantages for this method? Drop it in the chat. Nice. I love the chat. Everybody still. Wow. Awesome. I love it. I love the answers that are coming through. Suppose your hash function is not the best, and then for some reason it's not the best, and it's not doing any chance of anything to avoid collisions. It's just basically routing everything to the same index. Suppose you have a capacity for this array of maybe 100 and everything is being sent to index zero or something like that. So you'll have a long chain there, like, a long link list there, and then the rest of it will all be empty if you think about it. That's also another disadvantage, right. Like you basically waste a lot, and you also would need to look up. You need to look through all of the elements in that link list as you go on to search or something like that. So it's fine. For now. This is good. This is one of the approaches that we can try, but there's also another approach called open addressing with open addressing. What do we do? A way to implement open addressing? It's called open addressing, but what we do within it is called probing. We probe or look for nt spaces in our data structure when there's a collision. I'm going to explain to you. I'll answer some of the questions later at the end or like in the middle when we take a break. So let's talk about what probing means. Right? So suppose our hash function, it returns index zero for this particular four capacity array that we have here. And we're seeing that there's already something adding zero. Right. So what do we do now? Linear probing, like probing. Generally, what it does is from the point you're at, it checks the rest of the elements or the rest of the cells for empty slots. And another thing about open addressing is that you won't be using any kinds of extended link lists or anything like that. This will just be your array. That's it. It would just be the key value pairs and the hash function and then the arrays. So in this array, in the case of chaining, we also had a linked list to deal with the collisions. But in this, it's just going to be the array. Right. So you start at index zero and then you see, oh, okay. This is full. What do we do now? Linear probing goes step by step. It linearly traverses along the array. So the next empty slot until it finds the next empty slot. So that's the ending point. So it goes to one. It sees it's empty, and then it inserts there. This is for the second element. What about the third element? And like any other third element for which the hash function returns zero as the index. If zero is returned, it would come here. It would be like, okay, I need to look at the rest of it, right. To see where I can find the empty place or the empty slot. So I'll go here. That'll be one which is already filled because we discussed that there was another element before this. So what it would do next is it would look again to the next element and be like, okay, this looks empty. This is an empty slot. So I'll insert myself here or I'll put myself here, and then the same thing happens on and on again for every single entry that needs to be added to this hash table. And so the particular thing that we were talking about that algorithm, it boils to this. It's called linear probing. We can describe it like this with this. In this way, J is the index, right? It's the index that the hash function gives out. So the index plus one, like I marked it as J. And the terms here, you see, here are a bit fuzzy because these are the things that were there in the C Python implementation. So anytime, if you want to look back or something like that, this would be a good starting point to explore those as well. So you have your index index plus one mod size. So for example, the first time you come in, the zero index is spelled right. So zero plus one mod four. So one mod four modulus it returns the remainder. So it would be one. Right. You'd get back a one so that's where you need to be. So this is the empty slot. Or this is the next place to check for the second element that you need to, for which the hash function returns a zero that would be here. Right. And then the next step would be. Okay. So zero plus one mod four. So one mod four, which is one. We go here and then you see this is also filled. Then what do we do? Okay. So the index one? No. So one plus one, that's two, two mod four, which is two. So it has to go to the next index. It has to go to index two, basically. So whatever this returns, that is the index you'd have to jump to. This is very interesting, but also there are a couple of disadvantages to doing something like this. For example, if you're starting with zero, like we said here and the hash function returns zero for three key value pairs. Okay. So in that case, you'd probably fill up 0123, right. Like right next to each other because linear probing. And for example, if it returns 15 or something later on and then it returns two more based on 15 like it returns 15 for two more evaluate pairs, then it would be 15. 1617. So if you see here there's like that huge. It sort of groups things together. It sort of clusters them together, and it may or may not be good. Mostly, in my opinion, this is not the best because you have clusters of data. And every time when you're going to look up the same data like the same based on the key, you're going to look up the value again. This is for the insertion. We still need to look up right later on. So for that, it would still return zero. But then you'd have to check all of the other elements right next to it. Like you need to traverse everything next to it. So it just feels like, okay, until you get to that point because you're storing your key value information in each of the cells within the array. So clustering really feels a little bit. It doesn't feel the best. But this method is much better than chaining. Both of them have a lot of negatives to it, and they're almost constant time. There's, like, mathematical proofs for this in that there would be some extra constant space or some extra constant that we need to add into our lookup time or something like that. I'll make sure to link all of those things, all of those explanations and stuff within our get up repo after the session is over. Okay, enough talking about linear probing and all of its disadvantages and stuff. Python actually uses a probing algorithm that looks like this, the one right after this. So it's like the extra crop. You see the five and something called perturb. All of that. They're trying to make sure that the randomness or the information is equally distributed in the array, the key value pair information. It is equally distributed within the array. So that's why they've chosen to add things like this to sort of mess up the usual flow, because here it goes right after each other. You'd probe or look right after probing for empty spots or something right after each other. And this seems to be a little bit more different than what we were doing with probing. At least it would help with not clustering or grouping things together. So it would make sure that our key value pairs are equally distributed in the array, because at least it tries to do that. So what this means is if we apply the same kinds of like this may not work with our four element array capacity that we've decided, but it will definitely work for larger sets or larger array sizes. J is the index again and five times index plus one plus perturb. So this perturb variable is a hash code that Python generates, and this hash code at every iteration. It is right shifted by five. It's super weird, but they're trying to improve the randomness of this, or they're trying to make sure it doesn't have the same issues as linear probing. So it can be like when I say iteration or iteration, suppose you end up at one point the next time. If you end up there again, percher would have shifted by five, shifted by five, so it would land up at a completely different location in the array. Okay. I'm not sure if this is sort of broken. A lot more understanding, or it gives you some more thoughts to explore based on this, I think it's definitely fun to understand these things more as we go on. There's one more thing we need to talk about before we talk about other things, like our problem and stuff. So a good hash function with all of this. A couple of things about hash functions are that one. They need to make sure to minimize collisions if it's not blatant or if it's not super obvious, more collisions. There's more problems and stuff. So you need to make sure to avoid. You can't really avoid collisions. You can maybe minimize collisions, and they also have to make sure to distribute or to give out indices such that the key value pairs are uniformly distributed in this array that we're looking at. So those are two important things that a good hash function would have. There are many more things, but these seem to stand out the most. Okay, so let's look a little bit at the Python dictionary and set. They're going to be mostly dealing with dictionaries in the session or like solving the problem based on dictionaries in the session. But we will look at a very brief introduction for a set as well. So a dictionary, like why we spoke about open addressing, why we spoke about hash tables and stuff. So a dictionary is implemented internally in Python using a hash table. And the structure of it is that it has key value pairs where the keys are unique, and then it is ordered by insertion. What that means is it preserves insertion order. I think after 3.7 or so. Yeah, that's good. And with the set, it's the same thing. It's also implemented using a hash table. And here it's an unordered collection of data and has no duplicates. Pretty much the no duplicates thing. It basically should not have duplicates both of it. And at least in a dictionary, you can look up based on keys, but it may be different. The ordering technique that we were talking about before, I think it's mostly a dictionary. I think it's also a set, but to get into the explanation a little bit more of open addressing, like I said, it's a whole other session, but maybe we can revisit it in the future or something. We have a people dictionary, and this is just an example to show the structure of a dictionary and a set. So we have a people dictionary. We have two keys and two values corresponding to the keys. People's names is the set, and it's a set of names, people's names. And I have to learn how to name people better. This is pretty bad. Okay. So that's about it. There. Awesome. So Python dictionaries a little bit more. The insert delete and search times are all constant internally. How it works is we say it's a constant. It's an approximation. There would be some other it wouldn't be variables that add up to the time. If you have variables, then they would make your time complexity Huger or like, they'll blow it up or something. But if you have constants, like, I don't know, maybe 100 or something like that, you know, that they're, like, limited, and there's a limit to that extra constant that we add to the time. So that would still end up boiling down to constant time, because big goal is upper bound. Right. So yeah, some of the ways you can use this and some of my thoughts are maybe if you find yourself writing, I don't know, three, four loops or something, and it's super convoluted. Basically, you're taking up a lot of time, but you could offset some of that. You could trade in some space for time. Then you can definitely use a dictionary hands down. And if you're ever like, oh, okay. So I need to look for a value based on my key, or I need to look this up, and I don't know where to store this. And you always have Python dictionaries as like, the dictionaries are like the top of your choices of data structure. When you're solving a problem, it always has to be there, like in your Arsenal. So yeah, if you need to have a key value pair to easily look up values based on the keys, go for the dictionary. So the third one is more like a thought. The more you do, the more you'll see. Right. Like with most of these patterns and most of these things that we're learning initially, the worries that when will I ever get there? When you see more, it automatically. Trust me on this. As you look at these kinds of problems, more and more solve more of these, you'll naturally get that kind of an intuition. It happens with every data structure and every algorithm. You literally end up finding that template or something in your head and then you start doing it that way or your brain looks for these connections and you'll get there. Definitely you'll get there. Awesome. So within a dictionary, you'd also just if you want to recompute the value based on the key and then push it back, you can still do that with the dictionary. Right. The values, you can change the keys. You cannot. That's not ideal. Okay, awesome. With that, I'm going to take a quick water break and you should as well. If there are any questions or something like that, please drop the chat or either Summer Karen will look at it or yeah, quick water break. Thank you very much, Shana. While for those of you who the first time joining, we will be fishing to the live coding section after the break. Before that, just want to say quick shout out to everyone who is very interactive helping each other solving the problem. It's just been amazing. The chat is an amazing scene like everyone just helping each other. So just a shower to everyone in this study group. We not only use the link, we also use Repo as the link as the platform to do the life coding, which I just sent it over the chat. So if this is the first time you use the repo and then because Repo requires a membership, they do require you to sign in with your email account for this time being. You can also use the Collab notebook which we create. We just make a copy version from the Repo content so you can use this for the time being. So let us know if you have any questions opening the notebook. Definitely fixing the slack. So please let us know if everything is doing okay. You guys like doing the opening? No problem. Also quick two things. If you're doing repo the first time you have in your account, first time joining in, make sure you fork the repo, which should be on the top right corner so that you can create your own copy. This account contains all the PVs meeting problems that we've done, so you can take a look at that as well, for those of you who using Colab right now, Google Collab, make sure you go to the file on the top left corner, click it and save your copy in your own drive so that you can start editing it. All right. We're going to give a couple of minutes for everyone to get ready. I do see a couple of questions. Sonia is being very awesome answering all of them. Thank you. Sovia this is amazing. Yes. For any questions that we haven't answered. I can call, but we will be doing it at the end of the session and I can read up the question and I can help answering. Let me know how you folks are doing. If you want to slow down or something like that, let us know in the chat and maybe a couple more minutes, or if we can start, send a thumbs up or like a beer or something in the chat, I see some amazing questions. One moment for everyone to set up. Some music being super cool. Like I'm just looking at the chat and the discussions. So amazing. Awesome. Is now a good time to start. Is everybody doing okay? Great. Super excited for this problem, loving the energy we're back with after our water break and some interesting discussions. So let's talk about today's problem. It's a super fun problem and definitely enjoyable the first time you do it and later on as well. It's called group anagrams. This is the problem description right out of lead code. This is the link to the problem. The lead code link. If anybody could drop it on chat, that would be awesome. Group anagrams. Right. Given an array of strings, group the anagrams together, you can return the answer in any order. Very interesting. Anagrams. What are they? An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly. Once in a nutshell. What does that mean? So if you have like, maybe if you have a word with, like, four letters or something like that, an anagram is basically scrambling those and putting them in different positions and different directions and stuff. So if you have eat an anagram of that would be Te a T. The one thing within anagrams is that if you have maybe two E's or something like that in your initial word, even the anagram should have the two E's. So basically you should have the exact number of letters from the original word to the anagram in different positions or anything. The positions doesn't matter whether it's a word or not. That doesn't matter. But it will be an anagram if it satisfies this condition. Okay, so let's talk more like we were talking about an anagram of 80. So basically scramble the original word, put them in different positions and stuff, and you line up with an annogram. And also it's obvious that the length of the original word. It should be equal to the length of the anagram, and it should have the same number of letters as well. So I'm pretty sure at this point you sort of understand where we're going here. This is our input strings. It's a list of strings. So you have et and so on and so forth. Basically, our output should be to group the anagrams together and return the answer in any order. So what does this mean? Like results equals group one or group two. What are these groups? So you have eat what is an anagram of eat it's tea. And what is an anagram of T or like, what are these groups? So eat T eight. Right. What are some other anagram groups that you see? Drop it on the chat. Awesome. So basically, you sort of get the gist of what we're doing here. That's it. That's the entire problem. Do we have any other questions about how our input should be? So far? It looks like it only has lower case English letters. Do we have any more questions pertaining to that or something like that? Just drop it in chat so we can drop all your questions or anything. Yeah. Okay. Case sensitivity, empty list. Amazing questions. We'll jump into answering all of those questions in the Rupple in just a second. I'll have all of these in mind. Awesome. So let's code. At this point we're going to jump to our repo. Usually how it happens is we go over there and then we talk about the problem a little bit more and then we look at some more examples and stuff like that. I'll stop talking and let's go. Okay. So basically, this is just how I arranged. This is these are the previous sessions for our lead code study group. Binary search greedy approach hash tables is what we have now sliding window and two pointers. So these are the folders and then I usually call them into in main pipe instantiate. Just call the method that we want and then I pass in the examples here. If you have any weird, I seem to have pretty normal examples. We'll discuss the prerequisites or like the conditions for this problem as per lead code once more. And then if you have any examples or something posted in the chat, so we'll evaluate at the end. These are the conditions that are there for this problem. Your strings array your input. It only contains lowercase English letters. This is super simple. Like if it becomes this easy, then that's it. And then your strings, it can be from one through ten. Power four. That's the Max. And then your strings of I, which is every single. This is a string of I. Thank you so much. I forgot to ask. I'm so sorry. Let's see here. What is the font size? Let's try. Huge. Is that better? I'm going to move this away so that we get more space. Is that better everybody. Okay. Awesome. Just checking chat once more. Great. Okay. So this entire thing is your strings array. And then this is strings of I so we can have an empty like this. This is an example where strings of I is zero, but the length of your strings is one. So that's what I've written here in case you're inferring later and stuff. So these are the conditions for this particular problem. We're going to look at a couple of examples here just to sort of make sure to understand where we're going with this. And if you have any kinds of examples or something, just drop it down in chat. I'll check, like at the end. Once we go through this strings equals this, the result is going to be a list of lists of strings. So, like, eight T and eight. So they'll be grouped together. And this is group one. This is what we were talking about before and then Tan and not together, and then all of them together. Okay. And then if this is your input array, what is your output? Your output is just going to be a list, but like an empty one at that. And then if you have single element, like a single word, then it's just one group. Right. So you just return that if you have a single letter, that's just one group as well. So you're going to return that. Great. These are a couple of examples that we have going on. Let's talk about approaches. There are a lot of ways to solve this, but these two that we're going to talk about are some of the most popular and the most intuitive ones out there. So we're going to talk about both of them use dictionaries, by the way, but they have a little bit of trade off in terms of time and space. We'll discuss that in just a second. So the first one, we're just going to call it sort and compare. So basically, what we're planning to do is this right. So this is our input array. Okay. Yeah. Should have just gone with that. Could still go with beat. But whatever you take each word, you take each word and then you sort the letters in it. So if you have E-A-T, what is E-A-T sorted? E-A-T-A-T need to remember the Alphabet a little bit. Okay. And once you sort them, you basically add that into your dictionary like this your sorted word, and then the word before sorting. So now E-A-T sorted as A-E-T. And then you add E-A-T over here. Simple. Cool. Okay. Next one is Te a. What is Te a sorted. It's A-E-T. Great. And AED seems to be in the dictionary already. So we just add it to the next. We just append it to that same group. So tea. Cool. So we're not at Tan. What is the sorted thing? What is Tan sorted. So it's aunt, right. A-N-T. You don't see it here. You'd first look up first for that particular sorted thing within your dictionary. You don't see it there. So you add it. Now you've added it, and then you would add that word. So the sorted word and then the word before sorting. Okay. So you'd add in ten. Now we are at eight. Right. Sorted. You'll get the AET, and then you'll have to add that here. That would be ape. Now we are here again. Sorted. You'll get and you get the drill at this point. Just going to fill that out. And then B-A-T. Sorted is A-B-T. It is not in the dictionary at all. So we add it and we add the word as well. We're here. That's it. This is the approach, literally. Okay. We have this beautiful dictionary here. So what do we return at the end? We just return the dictionary's values, and then we got it. That's our solution. Right. So it's pretty amazing when you go over this little by little and you can return the answer in any order. So that's awesome as well. Great. So now let's talk about time and space are important. This is super important to continue. Right? Any idea about the time and space complexity for this approach, please drop it in the chat if you have an idea. Yeah. To start off. Okay. Any more answers, I'm going to give it like, ten more seconds. Depends on the sort. So the fastest sorting algorithm. I think it's N login. Pythonstem sort. It is. Timsword. Timsword is like a combination of Merch sword. That's another sword that I'm not getting in my mind right now. But even that is like N login. Okay. So let's talk about this. Basically, one thing we all know for sure is that it's all done first. Right. You're going to do, like, an entire path because you need to find the anagrams for everything. Basically, you need to group everything in your strings. Okay. First things first, we'll just add that in there. So that's open. Now, what is one more thing that we're doing every single time some folks said and squared. Basically, you need to consider the sorting part of this. You're going through the entire array. That's cool. You're taking each word and you're sorting it right. Because you sort it. You have another overhead of n log in. But N is used to depict the entire array. So let's call it maybe M or something. Yeah. So imagine M is the maximum length of any particular word. Like here. Everything has three, three and three. That's why I put in beat. Okay. So this is four. Right? If this is four, that's like, your maximum, you need to basically take the sorting into consideration. So when you sort a word, that would be another overhead of M log n. So in this case, if we're trying to do M right. So that would be M log. M just wrong. Okay. Not able to type properly. Basically, that would be your time complexity n times M LOGM or whatever you call your variables there. So the entire path plus each word, you sort it. So the sorting that you need to consider that. So that is going to be your time complexity. You're not taking eat and comparing with tea, eat with Tan eat. You're not doing it that way. If you do it that way, that would be N squared, I think. But what we're doing is we're taking each word separately, sorting it, adding it to the dictionary, taking the next word, sorting it basically just appending it to whatever group was there in the dictionary. So that is your time. What about space? Any thoughts about space? Anybody see this huge but beautiful dictionary here? So basically, if you're going to be super particular about, like, in general, it's off n. But if we want to be extremely I don't know particular about it or something like that, then it would be all of n times every single like the sorted thing, the thing that you have in your dictionary. So it's going to be of n times M, which would be your total like, this is your off. What is that? This is going to be your entire list. Right. Like your entire strings list. And then these are like your sorted strings or something. So just to remember it would be like old and definitely you could go with that. But I think something to think about would be to also mention that you're storing the sorted list as well, and each word is sorted and then you're storing that as well. I'm just going to write it as off, but just think about it. If we're considering that that would be of n times n for the entire list and for every single character in the list. Okay. So now that we've looked at that, what is one thing we can cut down on sorting? You can do it. But there's a much I'm a fan of this particular, like, the most important part of the next method. So that's computing a character array for each word and then comparing. So when we look at character arrays, I'll talk about that in just a second. Actually, like showing is better than just talking about it. So just write with me on this one. We'll get to the time complexity again. And then you'll see that we can definitely go with this approach more than the first one. Just because of the time complexity, the space is the same for both. But because we're storing it in this result like dictionary that has the entire list and then the maximum length would be like M. So yeah, within this, you basically take each word and you will find out the character count like chupel representation of that word. Actually array. I'm going to say array. What does this mean? This only works when you have conditions like, oh, I'm just going to have lower case English letters for this problem. In cases like that, you could basically just start with a 26 length array entirely initialized with zeros. Okay. Start with that. And what we're going to do here is we're going to go over the entire word. And basically ordinate what it does is it takes a Unicode character like a letter in this case, and it returns the Unicode code point of the character, which we pass in. Right. So in this case, you have ordinate of maybe e or something, and then ordinate of a like what I mean here is that you take a word, you get the counts of each letter in that word, and then you put it at some position within this character array, which is just completely beautifully laid out here. So how do you decide where to put it? That is where this part comes in. So you could calculate the offset like this. So ordinate of E minus ordinate of A ordinate of e is going to return one0 one or the net of a, like, we're always going to start with lowercase A, because that is the starting point of the lower case Alphabet. So it'll be 97 for small A, 98 for small B, 99 for small C, up until small Z, which is I'm not sure which one that would be, but it would go to there. You will start offsetting by that. And this is only 26. So if you do ordinate of E minus ordinate of a, you end up with four. So in our 26 count array, you'll basically seek four, and then you'd insert there. That is the point in the array that you'd have to increment by one. So 01234. So this was initially zero. But we're looking at e for e, maybe like the first one. Right. We basically increment the count of e, and then we're going to do it for the rest as well. So that's how we compute the character array situation. And then after that, what we're going to do is we're going to actually let me show it. Okay. So I'm just going to type it as character count one for now. For the first one, you're going to take each word. You're going to compute the character count array, and then you're going to basically add this word wherever you're going to add that character count array thing. If it's not already present in your dictionary, and then you'll add in the word for it. So for example, we find the array for this one, and then we would check here initially, it wouldn't have been present in the dictionary. So we inserted it over here, and then we add the word here as well. Great. So next we're going to do it for T. We do the same thing. We find that character count array. It's obviously going to be equal to this. So we add it here before all of that. Why are we doing it as a tuple instead of an array? Because I'm talking continuously about doing something like this. And the weird thing is we're basically just converting it into a tuple. Why if you have a list as a key, keys are mutable so their state can change because their state can change. They're not Hashable. So if you try, go ahead and try to have a list and then try to have that as key and then add your value or whatever to it. It's going to throw you a typer, saying key type of key list is not Hashable or something like that. Generally, you don't want to have mutable things as keys because you can change it from anywhere, and keys should not be changed. It just doesn't make sense for a key to just change like that. If you have a database or something like that, would you keep changing your primary key, like modifying it now and then? So it's basically something like that. So that's why we have a twopo. Okay, we had a quick tangent. Let's head back to what we were doing. So we have Tan. We compute the character count array. You don't see it here. So you add it here. So we have character count. I can't type character count two, and then you add ten. Okay. So we were here. Then we see eight. We do the same thing, compute an ad, and then we do the same thing for not character count array. And then we basically append that word to the character count thing that we found before. So this would be Nat. And then you find the thing again. It's not present here already. So you'll add it. It's the same thing. If you don't see it, you'll add it as key, and then the next time you see something having the same character counterfeit, you'll basically just depend to it that's it. Okay. Time and space. I'm so sorry. I feel like I'm talking a lot. So let's quickly do the time and space for this. Essentially, if you actually think about it, you're doing of n times every single word. You're going to look at every single letter in that word. So of n times your word, your maximum word length, which is going to be or something like that. And we did talk about time in the space before. So that's the same here as well n times. I definitely think even the space here is going to be n times the maximum length of that word, because that's what we're storing here. But yeah, we can talk about that later. So the thing here is there would be another aspect to this. Why I keep adding 26 here is that every single time you compute the character count array and then you add it. Basically what you're going to do is you're going to check that if that entire character account is matching something else on the already existing keys within results, if it already exists, then we just append to it. If it doesn't exist, we have to add it. Right. Just add that key, which is this that may take up a little bit more processing time, right. Because that particular tuple is going to be of length 26, but that's constant time, it doesn't matter. Look up is always constant time. And this is what I meant by you'll have that tiny little extra time, like constant overhead when you're looking up in a dictionary. So this is what I meant. It doesn't matter. It still boils down to end times M. So I'm going extremely quick at this point because I've been talking too much, but yeah. Okay. So now we're seeing we don't have that same like the sorting overhead that we had there. We just have this. So do you think this is a good solution to go with? Just let me know in the chat, I can't say the word chat. It's going to be super simple. Just the concept is difficult, but the actual doing part of it is like super simple. I'm going to wait for a couple more thumbs up. Awesome. Love it. Love the energy. Okay. Cool. So let's begin. Right. So what are we going to start with? We're going to initialize I'll talk through as I work so that we don't get stuck. None of us get stuck anywhere. So let's call this rest. Okay. And I'm saying let's do a default DICT instead of a dictionary, because if you've seen default Bicks before, they handle key errors. Which means if you're trying to look for a key that's not already present in the dictionary, but when you search for it, if you use a dictionary, it will throw you a key error. But in this case, this is going to give you a this is going to initialize it with the default value in general. So if you have lists like this, the default value for this is going to be like an empty list. Maybe. Why did I put in a comment then it's already a comment. Okay. So this is going to be an empty list. If you have maybe integers that you want to add as values, your default is going to be zero. If you want strings, you can do an empty string. So it's going to give you an empty string. Basically, if you don't already have it, it's safe to use default instead of a dictionary. In those cases. I think if you feel like you're going to be ending up in those kinds of situations, right. Okay. And for that, what do we need to do? Because it's from the Collections module from Collections import default date. Great. Let's go back. Okay. We have Collections default date off. We're starting with we're going to initialize it to an empty list. Right. So we can start with lift. It's going to give you an empty list at this point. I mean, you won't have anything at this point. Okay. What should we do next? Let's go back here. We're going to iterate through the entire strings, like your strings input. And then you're going to take each word and you're going to compute the character array. Count the character count array. I can never say that properly, even at the end of the session, but we're going to compute that. We're going to check the way we did here, and then we're just going to return. Okay. Let's quickly go ahead and do that. Let's just say word in strings. We're going to call it character character count. Three cannot type. Let's just write a helper it. Right. Find character count off the word. Let's just have it in there. We haven't filled it out yet. We're just laying down the blocks. Okay. And then what we're going to do here is after that, this is an array. This function will return an array. So we're going to convert it into a tuple, which would be your key. Right. So character account array. And since this is a list the first time, it will be an empty list. And if you're adding more things to it, it's just going to be like you're just going to append it to the list. So dot append off your word like the word that you want. Right. Okay. And at the end, what do we do? We just return result values, values is what you want, because that's what looks like this. Basically, you just need to return that and that would give you a list anyway. So you just return that. Great. So now we finished basically 50%, 60% of this. Actually, I'm going to call it 80%. Now we just have to figure out how to find this character. Maybe we could do it here. Actually, we could do it right at the top. Maybe. Let's just write out our fine character count, and then we're taking our Verdon running at this time. Okay. So what? We're going to start off. We spoke about initializing the entire 26 count, 26 length, 26 capacity array. Two zeros. Right. So Python has a really fun way of doing this. You just do times 26 and then you got it. So this is a character count array with 26 zeros. Now we're just going to have to loop through each character in your word. Right. And after that, what are we going to do? We're going to implement that count. So how do we find that count care count, care count of what was that again? Yeah. It's going to be the ordinate of your character minus ordinate of lowercase a plus equals one. Just going to increment it by one. That's it. And then you just return care count. And what are we doing here? Yeah, it's just supposed to be imported default already. So because we didn't import collections. Okay. So I think this is done, folks. We've literally covered everything. Basically, this is pretty good. We can just look at our main to see everything is okay. We have a couple of examples here, and then we have somewhere over here. We're just going to run it to see that nothing is weird at this point. Okay. The font is super small, and I'm running behind time. So this is basically you get it. It works. Okay. So we're good at that. I'm just jumping back quickly. Two things, just one quick thing. So the next steps from here, try to look at these two problems. They're very super similar. You have a lot of confidence right now at this point to tackle these kinds of things. So definitely look at those two problems and see what you think. Maybe post in our slack channel or something like that. And, yeah, with that, I'm so sorry, but I'm handing it back to you, Karen and Stephanie, do we have time for Q and A. We are pretty much a time at this point. We have another minute. So I just wanted to say a big thank you to chase not and Karen for a fantastic session. Thank you. Also, to sum you up for being a spectacular chat moderator. So big round of applause, everyone, for our amazing team. This is a fantastic session. Thank you, everyone, for joining us today. You are a spectacular audience, and I loved all the content participation. It was fantastic. We have a lot of amazing sessions over the remainder of the day today as well as tomorrow. Please check out some of those sessions. Now, we also have a really amazing networking feature, which I totally recommend trying that out. I love it every single conference that we have. It's a lot of fun. Please also check out our Expo booth. There's a lot of really exciting things happening happening over the next couple of days, and please enjoy the rest of your time at Connect Forward, so thank you all so much and have a wonderful day. Thank you so much, everybody. Bye.

Blog