Video details

Machine Learning for JavaScript Developers 101 | Jason Mayes


Jason Mayes

Discover how to achieve superpowers by embracing machine learning in JavaScript using TensorFlow.js in the browser.
Learn what machine learning is, get a high-level overview of how it works, get inspired through a whole bunch of creative prototypes (from invisibility to teleportation) that push the boundaries of what is possible in the modern web browser, and then take your own first steps with machine learning in minutes.
PUBLICATION PERMISSIONS: Original video was published with the Creative Commons Attribution license (reuse allowed). Link:


Hello, everyone, my name is Jason Mraz, I'm a developer advocate Potenza clergies here at Google, which basically means that if you're using machine learning in JavaScript in some shape or form out in the wild, there's a good chance we'll cross paths at some point. Now, if that's today and can talk to you about using machine learning JavaScript, of course. So let's get started. Now, first up, I want to talk about how machine learning has the potential to revolutionize every industry, not just for tech ones, but all of them. In fact, we could be standing right here at the beginning of a new age. We've already been through industrial and scientific revolutions, but what about the future? That could be a machine learning one, too. And we could be at the very beginning of that right now, which is a really exciting time to start learning about machine learning as you can jump on the bandwagon early and really get involved and have impact, of course, before I get started on that. What's the difference between artificial intelligence, machine learning and deep learning? I'm sure many of you today have very different backgrounds and it's important to understand what all this is all about and where it comes from and what will be the key terms mean so we can understand what we're going to be making later on. Now, first off, I want to start with artificial intelligence, also known as I. This is essentially the science of making being smart or more formally human intelligence exhibited by machines. But this is a very broad term, in fact. And right now we're actually in a place of narrow AI, which basically means that the system can do one or a few things just as well as a human counterpart could do in that niche area, such as recognizing objects. And a great example of that is when people in the medical industry are trying to understand what brain tumors look like. Nowadays, experts use machine learning to actually work alongside them to help point out what parts of an image may contain a brain tumor, for example. And this leads to better results because sometimes it's just too grainy for the human eye to see. But more can pick up on these find differences, which leads to better results for both the patient and, of course, the doctor now machine than on the other hand or more. In short, is an approach to achieve artificial intelligence that we just spoke about on the previous slide. Now, a key part about these systems is that they can be reused and this is done by creating systems that can learn to find patterns in the data presented to them. This is at the implementation level, if you will. So if you have an email system that is trained to recognize cats, you can use the same system to recognize dogs just by giving it a different sample training data. So if you just roll back to traditional programing, as you can see on the slide here, you can see that in the old days we use lots of conditional statements in order to find spam emails. For example, if the email contains a certain word mark as spam now, which is not very efficient because a spammer can just change the words slightly and get around those conditional statements. Now, fast forward to today. And machine learning programs essentially get tons of emails to classify, which are marked as spam by you. And it tries to find what attributes of those emails led to it being classified as spam or by itself. So now there's no battle between programmer and spammer, and instead the end user can concentrate on making great software instead. So what common use cases of Evan? Well, actually, there's quite a few bits of a typical use cases. I see machine learning being useful. There are others, of course, but we've got things like computer vision, like object detection example we just spoke about what numerical things like regression, predicting a no natural language, for example, text toxicity or sentiment analysis. We've got audio for speech commands, for example. And my personal favorite is generative, which is essentially things like style transfer and the creative kind of applications of email. And you can see on a slide example from Invidia whereby they are generating human faces and these faces do not actually exist in the real world. It's been trained on celebrities in this case. And you can see how this research can actually produce very cool imagery. So what about the planning, essentially, deep learning is a technique, the implementing machine learning, but we just spoke about on the previous night and one such deep learning technique is known as deep neural networks. So you can think it's deep learning as the algorithm you might choose to use in your machine learning program, essentially. So if you haven't heard of deep neural networks, don't worry. Essentially, these are just programing structures for are arranged in layers for are loosely trying to mimic how we believe the human brain to work, essentially learning patterns of patterns. And we get into that in more detail later in the talk. So in summary here, you can see how all these terms actually interlinked. We have a deep learning that feeds into the machine learning. So the algorithm that goes into the implementation and that machine learning gives us this grand illusion of artificial intelligence, which is what we're trying to aim for longer term. And these terms actually go back to the 1960s and 50s. It's not anything new. It's just that now we have the power with all the cheap processes of memory that we can actually make use of these techniques at scale with all the data that we now have, which previously wasn't possible in the old days. So how do we train machine learning systems? And that's a great question. Essentially, we need features and attributes. And you can see here from this example, if we just pretend to be farmers for a second, trying to classify apples and oranges to features or attributes you might want to use would be weight and color. But these things are easy to measure digitally and can be accessed at scale. So once you got those, if we go back to our high school math, we can try and plot those features and attributes on this 2D graph here. And we've got weights on the Y axis and on the X, and you can see how the green apples and red apples kind of clustered together there at the bottom in their respective color spectrums. And then the oranges, because they're juicy, they're actually slightly higher up on the weight axis there. And we can draw a line to separate the apples and oranges apart. And in a way, which is actually a very nice form of machine learning. If we could get a computer to figure out the equation of that line, because if we now classify a new piece of fruit, we take his weight and its color and we plot on this graph. If it falls above the line, we can say with some level of confidence that that piece of fruit is an orange. And if it falls below the line, we can assume is probably an apple. And that's kind of what is going on in all of these systems. The machine learning is essentially just trying to figure out the best way to separate the data so that I can classify it later on. What about bad teachers in that speech? It's not always obvious what you should choose here and here. It's a great example, ripeness, a number of seeds. This could lead to a scatterplot, as you see on the chart right now. And there's no easy way to separate this data with a straight line or even a curved line, for that matter. And this is a good example of a bad choice of features that sweets. And he might be like, well, why, Jason, would you choose such things? And it's not always as simple as apples and oranges, imaginative brain tumors. We're talking about earlier on, what features and attributes would he use to be able to distinguish a positive from a negative result? In that case, it gets very hard, very quickly. And this is known as speech engineering to find the set of features and attributes that give you the best separation in data. And that's what they get paid a lot of money to figure out properly. But what about higher dimensions? In our simple example, we had just two dimensions. Let's assume we had three. In that case, we need to plot on a three dimensional graph, as you can see on the right hand side. And here we instead of using a line, we need a plain or a rectangle in 3D space, if you will, to be able to separate the data in a meaningful way. Now is actually interesting to note that most machine learning problems are actually using much higher dimensions from three. Now, unfortunately, our human brains just can't comprehend what that looks like. But you have to trust me if the math is actually the same. And instead of using a plane using something called a hyper plane and by just means is one dimension less than a number of dimensions that you're working with. But the math works out the same and you're using these high dimensional space and dividing up in much the same way. So it should be easy, right? We've got a dog. We've got a mop. What could possibly go wrong? Well, some dogs, they mops and vice versa. And my point for bringing this up is that you've got to be aware of the bias in your training data. One of the biggest challenges you're face is not finding enough training data that is unbiased for the situations you want to use in. So in the case of recognizing a cat, something as simple as a cat, you might need to have 10000 images of cats, of different breeds, different stages of a life cycle, different shapes, sizes in different environments, different lighting conditions taken on different cameras. All of it is required to have the best chance of understanding what cat pixels actually are. And without that, you may end up having biases in your machine learning model, which would be very bad. The other point to note here is that data is not always imagery. It could be tables of data with text or sensor recordings, sound samples and pretty much anything else you can think of. As long as it can be represented numerically, we can use it in an email system. So that brings us, of course, to JavaScript. Why would you want to do machine learning in JavaScript? And that is a great question too. In fact, JavaScript can run pretty much everywhere in the Web browser, on the server side, desktop, mobile and even Internet of things. And if we dove into each one of those, you can see many of the technologies that you already know and love. On the left hand side, there are popular Web browsers you might use on server side with NOGs for mobile. We can support reachin native and also other things that we chat's and progressive web apps, of course. And for Desktop Electron can be used to write native desktop applications and of course Raspberry Pi for Internet of Things. And JavaScript is the only language that can be used across. All of these devices will ease any extra add ons and plug ins. And that is a very unique point about JavaScript on its own, which I'm sure you're already aware of. And of course, we tend to forget. You can run, you can retrain via transfer learning and you can write your machine learning models completely from scratch if you so desire, just like you could do in Python if you're familiar with machine learning in Python. And that allows you to basically dream up anything you might want from augmented reality, Gestur, sound recognition, conversation, eye, whatever it might be, you can do that in JavaScript now as well, giving you super powers in the browser and beyond. So three ways you can talk about using machine learning in JavaScript. I'm going to go through all of those now. The first one is Pretty Amadou's. These are essentially really easy to use JavaScript classes for common use cases. And you can see we have many APIs already from object detection, body segmentation, which allows you to find where the body is in an image, Perry's estimation, to detect the skeleton. And we've got speech commands and much, much more. And that some of our newer models on the right hand side that you can see, we now support base mesh, which can recognize 468 landmarks on the human face. We've got Campo's that can detect similar things for your hand and also the Burcu and Amytal that allows you to do question answer based natural language processing all in the Web browser. So let's see some of these in action and see how they perform. So first up, I want to talk about object recognition. This is using Kocho SSD, which has been name of the machine learning model that we're using to power this and that has been trained on 90 object classes such as these dogs on the right hand side. So 90 common objects can be recognized out of the box. Now, what's important is that you can see that this also gives back the bounding box data, which allows you to localize it in the image. And that's why we call this object recognition instead of image recognition. Image recognition is where you know about the thing exists, but you don't know where it is. So this is a pretty cool one to start with. I'm going to show you how we can write code to make this actually work ourselves. So let's dove into the code now. So first up, let's look at the HTML. This is pretty boilerplate stuff, they simply going to import a stylesheet their style box and then in our main body we can have a demo section that initially is going to be invisible. So you can see it invisible except at the very beginning there. And then we have some images that we want to be able to classify unclick series or have the class Casablanca and an image contained within that containing d'Hiv. There needs to be any images you want. And then at the end that you can see, we simply have three script imports. The first one is essentially bringing in that sense of Fojas bundle. The second one is bringing in the SSD machine learning model. And the third one is, of course, the JavaScript. We're going to write to get all of us working. So looking at the first lines of the JavaScript, first of all, we're just going to define a constant called demos section. And that's just going to get a reference to the demo area where all of our images are living. We're going to set a variable model has loaded and set it to false and also define a variable for the model to store that once it has loaded. Next, we need to load the model, of course. So all we need to do is call Kocho SSD or load. And because this is an async function, we use the Venne method to call back a anonymous function. In this case, with the results, you can see the anonymous function simply takes preloaded model as a parameter and we can assign that to a more global variable called model. And we set model has noted the true. So we know that things are ready to use. Finally, we removed the invisible class from our demo section to make sure it's now visible and not great out like it was before. So next we get a reference to the image containers, i.e. or the device that had that classified unclick class. We can then loop for all of those and essentially add a click handler to each so that we can decide what to do when each image within it is clicked. And here we go. Here's the handle click definition. We simply check if the model has loaded. If it hasn't, we're going to return straightaway because there's no point doing anything unless the model is available to use and if it is available to use, we're going to essentially call model detects and we get a pass at the image that was Clixby event target in this case. And then again, this is an async operations we use for then to then call our other function handler. Predictions once is ready. And in the predictions you can see we now pass a predictions object that simply we can look if we wish to kind of inspect as we so desire. But essentially this contains ALVIE machine learning predictions that came back for that single image that we tried to classify so we can loop through predictions and we can create a new paragraph elements for each and sets what we what we saw along with its confidence. And then we can also set the margin of this paragraph so it sits nicely at the bottom of the bounding box. And then, of course, Bisping could highlight is essentially the bounding box that I've created. And we're just setting B, X, Y, WIP height coordinates of that element so that it sits in the right place in the context of his parent, Dave. And then, of course, we just add these two elements to freedom, and that should now be visible. And finally, access is pretty self-explanatory for various moments when we're changing for so if we put it all together, which is what we get. So as you can see, this is the code running. And I can now click on one of these images and you can see instantly. I get results coming back with the bounding boxes showing the items they found in each image. I actually did a little extra bit of code here to do the same thing, but with the webcam and if I enable this, you can see that I can now see myself, too. And this is how the performance is pretty cool and running in a high frames per second. And all this is running line in the Web browser, which means, of course, that your privacy is also preserved because this data is not being sent to a server classification. So next thing I want to talk about is face mesh. You can see here how it can recognize 468 unique points on the human face and it's just three megabytes in size. In fact, many people are starting to use it in creative ways, such as multiphase, which is part of a Loreal group who are using it for our make up. Trayon, as you can see from the image on the right. This lady is not wearing any makeup on the lips. In fact, lipstick is being chosen dynamically at runtime in the browser. And then we are playing it because we know where the lips are from. Face mesh pretty cool. But let's see this running for real. Using my face, I can explain a little bit more. OK, so now you can see my face in the web browser and as I open and close my mouth, you can see it reacts really well. Is running a high frames per second, but this is just running on the CPU. I can't swim at the top. Right. I can get even better performance by running on my graphics card. Now, in addition to doing the machine learning in real time, because JavaScript is obviously great at graphics, but also rendering of 3D point cloud that we can also tinker with at the same time. As you can see, I can move my face around on the flip point too, so you can use this to make pretty much anything you want. So next up is body segmentation. This model allows you to distinguish 24 unique body areas across multiple bodies in real time, as you can see from the animation on the bottom here. But you can see how well that segments and it even gives you estimation for the pose of each body to I think the skeleton is, which can be used to do gesture recognition or much, much more. Now, model such as body picks can be used in really delightful ways. To his two examples I created in just a couple of days, but allow you to do some powerful things on the left hand side. You can see how I remove myself from the webcam in real time, rendering myself invisible, much like a Harry Potter cloak or something like this. And as I get on the bed, you can see how the bed still deforms, even though I'm removed from the cam feed in real time. Now, on the right hand side, you can see another camera created that allows me to measure my body size in real time. And I don't know about you. Whenever I'm buying clothes, I never know what size I am. So I made this to help me out to find my size for different brands on the websites that I use. And in under 15 seconds I can get a result back for my chest measurements, my inside leg and all that kind of fun stuff in a much more frictionless way. And of course, all of it runs in the Web browser. So my privacy is preserved. None of these images are going to a server. And of course, all that can give you superpowers, too. What if you combine technology with something like Webjet Shaders? In that case, you can get an effect like this, which is made by one of the guys in our community in the USA, which can shoot lasers from your mouth and eyes all in real time at a battery, remove 60 frames per second. But let's not stop there, if we combine it with WebEx are a very emerging Web standards. You can now even project people from magazines into your room in real time, too. And this guy is using this on his phone and then he can walk up to the person and kind of meet them in real life, virtually speaking. So that's pretty cool. And I thought, well, if I can do this, then why not go one step further and combine it with Web RTC to teleport myself in real time? And you can see here how I can protect myself from my bedroom into another living space. It could be somewhere else in the world to meet my friends and family such that I can be closer to them even when I'm not. And having tried this myself actually does feel better than a regular video call because you can walk up to the person and move around them and all this kind of stuff, which you just don't get with a regular video call. Now, the next way you can identify, forget is by transferring, which is where you retrain existing models to work with your own data. And the next logical step after using our pretrial models to make things more customized to your needs. Now, if you are an expert, you can, of course, code all this stuff yourself. But I want you to waste a day on how to do this in a super simple fashion that the first one is teachable machine. This is a website created by Google that allows you to retrain data in the Web browser for very common tasks like recognizing an object or speech recognition or estimation, for example. And in just a few clicks, you can make your own model search. Try this out right now and see how easy it is to use for something like a prototype. So here's one machine we can click on Image Project to start and I can click on webcam. And you can see now that I'm just going to take a few samples of my head in front of the webcam. And then I'm going to do the same thing for Class two and we take a similar number of samples, but this time I'm going to use this deck of cards. And we've got a similar number of images that you can see I'm now going to click on Train Model and essentially that means it's retraining the top layers of the model that we're using so that we can classify new data using things it learned from before. So in just a few seconds, this process will be complete and we can now see a life prediction coming from the webcam. And hopefully we can see that in class. One is predicted right now. And if I put the deck of cards in front, it should not show class to class one, class two. And look how responsive that is. It's really, really fast. And you can get this great performance in just a matter of seconds, like 30 seconds. We made a custom machine learning model. So do you try it out in your spare time and you can use this in prototype so you can simply have an export model top right there and you can save the Jayton files, but you need to venlo this model in your own custom website later on to do something more useful. So maybe I can show a deck of cards and reveal a YouTube video or whatever I want to do. Now, the next method I want to show you is if you want to do something more for production use case, which is more than just a prototype, you might have a lot more data. And of course, in the Web browser, you limited by the RAM, but you can use in a single tab in Chrome, of course. So if you have like gigabytes of data, you can use cloud auto now. And this allows you to train custom vision models in the cloud, which can then export to tens of logs just like we did before. So here you can see I've just uploaded lots of data of flowers in this case, lots of different photos of different types of flowers. And all you need to do is specify if you want to train for accuracy or faster predictions. And of course, with machine learning, there's always a trade off between these two things. But you can choose which you prefer. You click next. And then after a few hours of training, it will give you the option to export to tens of, as you see on this slide. And it's super simple to use this exported Jason file. In fact, here's the code all in one slide. All we need to do is include potential Fogg's library at the top here. We then include the author in our library as well. And then below this, we have a new image that we have never seen before. But this is just a data image I found on the Internet. And we can then essentially use this as the image you want to classify. And then in just three lines of JavaScript below, we can now classify the image. So the first thing we do is we wait for the model to load. So we use to alter EML that load image classification and we simply pass a reference to the model that Jason file that you would have downloaded from Cloud Autosomal and that can be hosted on your CDN or your website or wherever you so desire, because this is an operation we use the keyword, of course, and then that gets assigned the model. When it's ready, we then get a reference to our data image, which is the new image you want to classify in this case. And we simply use models that classify and posit the image and await the results to come back. Once this is allocated to the predictions object, which is just simply adjacent object, we can pass through and see all the predictions that came back from the model for that single image. And of course, you can call model, not classify multiple times once the model is loaded. So if you were to use a sort of a webcam, you could then of course do that instead and have it running in real time on webcam data. And third way, of course, the user interfaces to write your own code from scratch. Now, this is for the machine learning experts out there or people who want to do more hands on low level. And of course, going on. That would be too much for a 30 minute presentation today. But there's plenty of tutorials on our website, which I share with you later to get started with this. But today I'm going to give you the super powers and performance benefits you can get by running in JavaScript and node, for example. So first, I want to talk about the different apps we have available as to API is the first one is Bellairs API, which is essentially like Carus if you're using Python in the past, and that is a high level API, but super easy to use. Now, below this, we have the Ops API, which is much more mathematical and is like the original 10 zero stuff, if you will, and that allows you to do all the funky linear algebra and all this kind of stuff. So depending which way you want to go is two flavors of tense. You can use here based on your experience and capabilities. So you can see how this comes together. Essentially, we've got our models at the top there based upon Vélez API and then that sits upon the core Ops API just below that. Now that can talk to different environments such as the client side and within the client side. You might have different environments as well, like browser WeChat or reactivated, for example. And each one of these environments knows how to talk to different backend, such as perceptive as always available, but also other things like website or graphics card acceleration on the front end or WASM Web assembly if you want to have better CPU performance. And there's a similar story, of course, for the backend on the server side with no jobs. And here it's important to note that we actually have the same performance as Python. And so here we are calling the same upload CPU and GPU bindings that Python has to the C libraries, but to itself is written in and that allows us to get the same Kuda acceleration and ABC's support for the processor to make sure things are running as fast as possible. And in fact, if for some reason your machine learning team is still using Python, then of course you can upload and save Python models from the letters API using CARUS and you can use the same model formats via our API to load back directly into NOGs without conversion. So you can just take a save model and then use that in nogs. Now if you want to use one of those save models on the client side, then you have to use our command line Tenterfield converter and that will convert the model into adjacent format. We need to run in the Web browser. So let's look at performance then. Here is centerfolds versus Python running mobile net. And these are the inference times, how long it takes to classify the thing we're looking for in the image at the top. There you can see running on the graphics card in Python, seven point nine, eight milliseconds and in nogs just eight point eighty one milliseconds. So you know that within a certain margin of error anyways. And it's pretty much the same for all intents and purposes. Now, where it gets interesting, of course, is but if you have a lot of pre and post processing, which basically a lot of animal models do, because in order for the model to digest for data, you need to manipulate the original data into something that is usable in machine learning land. Then you're going to get further performance increases in Noguez because of a Just-In-Time compiler of JavaScript itself. In fact, we've seen with people like Hugging Face, which are quite famous for making natural language processing models, that they've seen a two times performance boost just by switching to nogs for their machine learning pre and post processing. So now we focus on the client side. But just a second, here are five superpower's you get, which are hard or impossible to achieve on the surface. Right now, the first one is privacy. As I kind of hinted at before, all of these machine learning models are running in the Web browser on the client machine. That means at no point is any of a sensor data going to a third party server for classification. And that's really important in today's world, where privacy is always top of mind and we tend to forget you can get that for free. Of course, the length of it is lower latency because no server is involved. When you're running on the client side, then we don't have that round trip time from the mobile device, let's say, to the server, which could be over 100 milliseconds or more in bad mobile network connection. And of course, that leads to lower cost. If you have a reasonably popular websites, you might be spending tens of thousands of dollars on graphics cards and BP processes to run those machine learning models by running on the client side. All of that hardware is no longer needed. And of course, you can just execute directly on the client machine as your no interactivity is a big thing for JavaScript. It's kind of been designed for that from day one. So we have a much richer ecosystem for graphics and charting and all that kind of fun stuff. And the final point reachin scale, which we all know and love being Web developers ourselves, essentially anyone can click on the link in the Web browser and have the machine that is loaded for free that is trying to do this in other ways on the server side, which would require you to, first of all, understand Linux and install Linux, then you need to install we tend to upload stuff and the driver's Fukuda from Invidia. Then you need to install the GitHub repo and compile it and make sure it runs with the environment on the server side so all of hassle goes away when you're running on the client side. And that can get you more eyes on your research and machine learning, which could be very valuable. If you're a researcher, for example, maybe that means 10000 people can try your model out instead of the five people in your lab that can maybe uncover bugs or biases in your model, but you can then fix before you see prime time. Now flipping to the server side for just a second. There's also some benefits there, too, of course, if you choose to use Narges. So obviously we can use a potential flow state model without conversion. As we spoke about. We can also run a larger model of what we can do in the client side due to the memory limitations in Chrome Pattabhi. And of course, it allows you to write code in just one language, which is of course, JavaScript, which, needless to say, a lot of devs use JavaScript. According to the Stack Overflow Survey 2019, I believe 67 percent of people are now using JavaScript in some capacity, which is pretty cool. And then the performance benefits, of course, you can get by getting the just in time compiler boost in nogs over using machine learning in Python, for example. So with that, I would like to talk to you a little bit about the resources you can use to get started if you're interested. It is one slide you want to bookmark today. Let it be this one. And the next one actually say essentially, hasn't it always you can use to get started with our code labs, you can work for them step by step and learn as you go with a really robust ways to learn some of the things we've done, suffragists and machine learning principles in general. I mean, of course, this slide has pretty much everything else on the slide. Here's our website to get started. The models that you've seen in this demonstration and many more are available on our GitHub there. And we have a Google group to answer any more technical questions that you may have or may be thinking about later on. And then finally, we have copeton glitch, which are boilerplate code you can use to get started. Now, on the right hand side is our recommended reading material. This is a great book that covers everything. Even if you have no machine learning background at all, that's completely fine. I don't, you know, some basic JavaScript. This book will take you through everything you need to know to get your machine learning chops up to scratch. And with that, please come join our community impact here. Just a few more examples of what people have been making just for the last few weeks. And this is growing every week. If you check out the major Tagus hashtag on Twitter or LinkedIn, you can find what people are making right now. And please do contribute your own for a chance to be featured at future show sessions or even conferences and such in the future. So the final thing I want to leave you with is this last demo from a guy in Tokyo, Japan. He is actually a kind of dancer and he's now use machine learning centrifuges to make his next hip hop video, as you can see here. And it's really great to see creative folks starting to embrace machine learning as well. It's no longer just for the one percent of people with PhDs. It's now for everyone. And hopefully centrifuges can make this even more accessible to all in the future. And I'm really excited to see what you will make. And please do take us with made of if you do make a living in the future so we can share it with the team so that you stay in touch. Happy to answer your questions after the talk or connect with me on LinkedIn or Twitter and happy to ask questions over there as well. Thank you very much for watching and see next time.