Video details

JS Monthly #17: Ben Butterworth - Anonymous Video Calling App Using Machine Learning

JavaScript
10.18.2021
English

By turning on the webcam during video calls, we reveal our face, identity, race/ ethnicity, disabilities and living conditions. We also don't always feel like putting our camera on when talking to other people over video calls. We are also unusually quiet (or muted) when listening to someone on these calls. What if we could have video calls, but still avoid all these issues?
Anonymous video calls allow you to communicate with someone using your facial expression and animation, without revealing your identity, physical background or other sensitive information. It allows you to safely share expressions and emotions. Your webcam feed is processed locally, sending only the minimal amount of data that represents your virtual face.
You can try it at https://github.com/ben-xD/club/
The JS Monthly Meetup was organised by
Aris Markogiannakis (JS Monthly host) and hosted by Prisma. ✨ Join the Prisma Meetup group here: https://www.meetup.com/js-monthly/
Follow Ben on Twitter: https://twitter.com/orth_uk
Next: 👉 Check Next video: Heavily Connected Data Applications - https://youtu.be/D7C-97HgXTs

Transcript

We're going to wrap up and move to the next speaker, which is Ben, and he's going to take our brains out with machine learning and some of the cool stuff that he's developing. So yeah, I'm going to thank you, both of you. And we're going to on the stage. Hello. Here you are. Ben works for Apple. Real Time is going to be sponsoring CDTs this year. And we have also job that works with you. And so Serotika, right. So you got talent. We haven't seen you before. Yeah. This is my first time giving anything like this in public. So I just thought I made something cool. When I first joined AB, they gave me a chance to do anything I wanted as long as it had a bit of AB in it. So I decided to use, like, kind of JavaScript or actually type script and to basically learn a whole bunch of things like three JS Media pipe next JS and Cloudflare pages, Firebase functions like all of that. Okay. Let's see your screen. Then I'm going to go and I'm going to come back when we finish the talk and we can have a chat about it. I'm so looking forward to learn from you. Yeah. So I shared my screen. I'm also just watching the YouTube comments. If you have any questions, I can answer them as you ask them. Like I said when I joined Abley in April this year, basically, my team lead said I could do anything I wanted as long as it had a bit of AB in it. So before I started, I was working on mobile apps and basically doing like machine learning processing, like image processing, face detection and clustering. So I thought I wanted to do something that had machine learning as well. But this time use JavaScript or actually type script. So actually you can have a look at the code if you go to GitHub that's like the repo there and here's like a demo. And I think the issue is it actually struggles to load because I'm already using my webcam for this video call or this presentation. So it actually for some reason it's not able to find a face in the camera. So I'm just going to leave that to load. But what I do have is like a screenshot, which I just took earlier today during the presentation of basically three clients. And maybe what you can do is also try it yourself. And I'll put a link in the YouTube comments. If you visit that, you can try and connect, press the green connect button and you can basically join kind of the call. And maybe I'll see you here. So if you connect into the call it should display on this page. What I wanted to do is explain some of the things that I used to build this. So if you actually go to the website, there's a list of technologies I used. The first one is like next JS. So what Versailles says what this is is kind of the best developer experience with features you need for production. For static site generation, you can't see the link in the chat. Sorry. I just sent the link again. Static site generation is when you kind of render all the react kind of files into HTML, CSS and JavaScript without having to run a server to run the processing when someone visits your site. So it's already generated. So it's only like static files that need to be downloaded by the client. So that kind of makes it really quick. And also it's better for SEO as well, because the SEO engines like Google, the crawlers will be able to visit your website without needing to do any processing, which is ideal for them to lower cost. And it's basically really nice. Next year was everything was working out of the box and that's kind of not what you get with a lot of JavaScript things. Sadly, there's just so many things in JavaScript that a lot of things don't work out of the box. And this is why next year, and basically for me, it's building pages quickly and happily. So we can have a look at kind of the page. So I've only got two pages here, which is the home page, and it's quite a small page. And then the video room, which is a bit longer, which is what you see. This is the homepage, and this is kind of the video room where you can pick your face, the color of your face. We have a few people connected, one person. So that's kind of the first thing that allowed me to make this page and kind of show the page. The next thing that I'm using is flat buffers. So you might know to send data across the Internet between people or from a client to a server. There's a few kind of formats you can use. A lot of people use Jason. And before Jason, it was XML. But the issue with this is that it's still text. So it's not actually very efficient, like in the actual space that it uses over the internet. So if you wanted to send loads of frames per second of someone's face using Jason for that data is going to be very slow. So instead you could use other serialization formats. So like one that you might heard is protocol buffers, which is very popular, like in servers like to go between microservices. So gRPC uses protocol buffers as the serialization format. But actually protocol buffers, in this case, wasn't that efficient because specifically, I think in the JavaScript library or maybe generally in protocol buffers. Generally, they didn't support float 32 or kind of smaller data types for numbers that I wanted to use. I had, for example, a float 32 array. If I serialized it into a protocol buffer, it would be casted to a float 64 array, and then it would basically take instead of 2 KB. It would take 4 KB. So I've basically increased the size of my messages and basically cost more data for the user and for my servers. So to reduce the number of data being used like in Kilobytes, I basically use flat buffers instead. So it's basically made by Google, and it was used by a lot of games actually to send data and also to save data on the device, like Android apps, for example. But the only thing with flat buffers is quite a verbose API, but it's still less verbose than manually serializing it, and it's sometimes worth it. So I wanted to show maybe if I can actually find well, yeah. So for example, when I wanted to create like a flat buffer, there's actually a lot of code to use here. I mean, this is auto generated, but actually it's been a while. So face message is where I actually create. So if I want to encode into a flat buffer, it's quite a basic API. It's not as simple as protocol buffers, but if I wanted to serialize it manually, which is what I did initially to avoid using like a dependency. Then there's a lot of messing around with Int arrays and data views, and then kind of copying data back and forth and casting types. Moving to flat buffers was a great thing because it reduced the complexity and also handled more complicated cases. For example, if you want to serialize, let me just uncomment this code so we can see the syntax highlighting. If you wanted to say serialize a string, you have to save the length of the string in the buffer and then put the string in. So you know how far to look to take out the string back from the buffer on the other side. So there's a structure of data you need to design and you might forget the structure. You might not document it very well. So that's why using a serialization library like flat buffers is nicer. So if you need performance, go to flat buffers. But protocol buffers is more commonly used, and media pipe is basically the library that generates the face based off your webcam. So the website is here and there's loads of features. So the face mesh here post detection from your webcam and hand tracking. And this is actually what runs Google Lens and Google Translate. If you've seen like it's replacing text in real time with the translated text from a camera like on Android apps. It's basically using media pipe under the hood. And also they don't just use it on edge like mobile devices. They also use it in Google cloud for detecting if some YouTube videos have unsafe content. So definitely have a look at media type. And my intention of showing kind of all these is to give you a set of tools that you might play with in the future and know what to look for. What type of things are available and some people might know them already. The most complicated thing that I kind of did was playing around with three JS. So in this you can actually use your wasb keys to move your face around. So if you try that, you can go down all the way to the bottom of the screen and you can change your color and change your username as well. Rendering the API for three JS is very powerful, and according to the documentation, it's easy to use, but there are some issues. For example, if you are rendering text, you create a text object, you set the color and you set the text content and then you put it into the scene and then you render the scene. So all those steps are explicit lines of code. But if you want to, for example, change the color of the text or change the text content. You have to delete the old text and then create a new text and put it back in the scene because the API doesn't allow you to replace the text content. So let's see if I can actually find this code. This is the most complicated file in this whole. Before I show you the rendering code, this is the machine learning code, which is so simple. It's only 47 lines. It downloads the machine learning files from the CDN, and this uses media pipe. Basically, you set options and then you just set this callback, which basically this is just my app that has this callback, but you give a callback to the on results. And then when you give an image to this holistic object, you send an image to it. This callback will be called. So it's like an Asynchronous API. So set the callback and you will get the results here. And this result contains the pose for your hands and your elbows, and it contains your face landmarks. So there's quite a lot of landmarks here for your eyes, your nose and your whole face, and also your hand. So this is why actually it's called Holistic because it takes into account like kind of not just your face, not just your hand, not just your pose, but also everything. So it's called the Holistic model. It's really simple to use. You should play around with it. There was actually a memory leak, which we found in it, but it's actually fixed. Now, don't worry, the app that you're using won't have a memory. But for some time the app was actually going to increase from like 100 megabytes to 2GB and basically crash the tab because each tab in Chrome has a maximum of 2GB of memory that it can use. So basically hit the 2GB and then was unable to continue. That's not it. I was going to show you the rendering. So this is the most complicated file in this whole application, and we have more than 500 lines and the most easy thing to do is the key controls. Right. So maybe you're wondering, how do you support someone move up and down in this? So what we do is we keep the state of the offset of the person of the face in the whole area. And then when someone moves or presses up, actually, when someone presses down on the key, then we'll move. If they press arrow down, we'll move the face left, or if they move left, then we'll move it left. But also we want to support A and we also want to support if they press shift as well and in all directions. But now you might be thinking there are actually two ways to do it. If someone presses down, then we can just add the event listener, but then they would have to press down again and again. So instead, if they press down. Then. We basically listen to the event. And then when they press up, we remove that event from the list. And the advantage of this is you can handle if someone presses up and left at the same time, then we will move the user diagonally to the top left. Because otherwise if you only add event listeners for key down for key press, like, for example, arrow left was pressed, then you will never get the event where key up and key down happens at the same time. So you won't be able to move people diagonally, so they would have to move left and then move up. So that was a long winded explanation of how to do controls in a game or in something like an interactive application. And here's the complexity with the render text. For example, I'm actually using Ubuntu font. I was initially going to use Comic Science, but I decided to use Ubuntu instead for this one. Let's see where this is used. This is used in a few places, and basically before we render it, we'll remove the text so that it basically gets rerendered again. And this is because the APIs is not that flexible. It's powerful, but it's not that friendly. I guess there's loads I can go in this, but I went with the basic I guess the basic problems that I faced. So let's go to the next thing. Basically three JS is like Babylonjs, but it's kind of more popular now, and I kind of tried that instead of Babylon. I haven't used Babylon myself. Tailwind CSS tailwind is basically why it looks. This website looks the way it does, and I think I quite like the way it looks. It doesn't look like a sophisticated app, but it's like simple. It's got some nice colors and the colors I chose from Tailwind. So this is the Tailwind website and let's see so you can see it's so easy to kind of look at user's website so you can see the color palette here. These are all really nice colors. And what I found is they're just really nice CSS properties to use, and it's just much less confusing and overwhelming than Bootstrap. I find Bootstrap is quite complicated. It's kind of overwhelming, but telling gives you these simple primitives to make you really quickly productive in your app UI. Okay. So this is specifically about my company. Now. I work here and I've worked here since April this year, and we're growing. And also I'm actually not a JavaScript developer, so I only did that because I played with TypeScript and JavaScript before, and I wanted to make an app that people can easily use without having to install a mobile app, for example. So for my demo, I use like a website, but actually I work on the Fluffer SDK, and hopefully we're going to do a release tomorrow, but you can use it already. There's a release currently, but our release is going to add push notification support for Flutter. Before that, I wanted to say what I used Ably for in this project. So if you look here. So I'm sending the face data between users using a Blue. So a Blue is a reliable and scalable messaging service, so you can use the libraries like Media pipe like flat buffers. You use the AB library, install it into your app and create an Ably account, and you don't have to host any servers to send messages back and forth. You can just connect Avley using the library and then send messages, and your other users will receive those messages. And specifically in the next release we're going to do for Flutter is you can send like, a message to someone else, and if the app isn't on and the app is in the background, they will get a push notification and they'll basically be notified even if the app isn't connected to a Blue at the time. So what I wanted to say is a leads for highly reliable traffic and scalable traffic. But for unreliable data for video traffic where you don't care that each data arrives because it's like throwaway like if you miss a frame, you can forget about it. Kind of data like that might be most well suited if you use like Web, RTC and use, for example, the data channels. Or if you have video audio, you can use the media channels. But in the future there's also new protocol, new API for Web browsers called Web Transport, and you can have a look at that. Basically it's like WebSockets and WebRTC because WebRTC is for unreliable traffic. Websockets is for reliable traffic, and Web transport allows you to configure both of them at the same time. For this app that I made here, there's lots of features that I would like to work on, like adding voice calling, so we can actually talk to each other on there and adding rooms and refactoring that rendering code, adding 3D cartoon characters. If you're interested, we can think about it and we can work on that together. Websockets so that should actually be web transport. So web transport. It can be used like WebSockets, but with support for multiple streams, unidirectional streams out of order delivery, reliable and unreliable transport. They can look at that link and it's basically very new and it's not actually available for apps to use yet. I'm pretty sure it might still be in the Origin trial for Chrome. Yeah. So if you want to connect with me, I have my LinkedIn and Twitter things on my website, which is that also UK, I think. Is that my last slide? Yeah, that's my last slide. Whilst we're here, I might as well show you this is threejs. If you haven't threejs. If you go to threejs. Org, you can see amazing demos. For example, this demo, the Electric Bike by Van Mouff. Obviously Three JS is quite heavy. So you're downloading all these files sometimes and it takes a while. My internet is not that good. Let's see what's going on here. So we are downloading things. I guess it's just my internet. That's really bad. The files are not that big. 2.1 megabytes. Oh, that's quite big. There's even MP three files being downloaded. Wow, that's quite a lot of MP three files. Yeah. So whilst that loads media pipe is the machine learning one check that out. Mediapipe Dev flat buffers is the serialization library for high performance in Csharp C go Java. Van Louis is still Loading tailwind and this is our websiteably. Com. It's for kind of reliable messages that you want arrive extremely quickly and also at scale. If you want loads of devices connecting and you don't want like complex databases to handle all the state and connections avoid is the one to go for. And what else is there? We looked at next. Js. This Van roof is taking a while. Let's go somewhere else then this one three JS is amazing. This is why I wanted to use it and play around with it. There's like racing games using three JS. Yeah. Thanks for listening. And let me know if you have any questions. I think from all the talks today we have learned so much like we have testing. Then we have data and how you can degrade. Then you went through so many nice libraries that you found out. Yeah, it's quite a lot to take. But. It'S pleasant learning. You're a flatter guy. But. What was your take with all these JavaScript technologies and what did you learn using them? Yeah. I think what I really liked was when I was using a type script. It was so easy to learn a lot of these things because the types were already there. So when I want to see an argument, it's just there because I remember before when I was doing JavaScript stuff, it's like you don't know if you do it using it the right way. But in TypeScript they force you. The compiler forces you to do it the right way. But I guess I just really enjoyed the diverse range of libraries, from machine learning to rendering, and I didn't mention some of the libraries I use, but I used one for this name generation, so for the username, basically in the app, I can create a random username like witty Iguana and refresh adverse termite and confused Piranha. And it's just so nice. There's a library already with this list of names. How many labels that have been built that you don't have to reinvent the wheel and how it saves you time. Basically. Prisma, you don't have to really go and start developing all the backend yourself. You can do it very quickly because time is quite important in our times. Daniel, what is your thoughts? First of all, I got to say I love to talk. I couldn't relate to you in so many moments. I started programming with Flutter, so I missed darts. Sometimes I worked with Python, with machine learning. Actually, my final project at school was developing a neural network to identify skin cancer. And lastly, with gRPC. Right now at Lagi, we work with gRPC in Kotlin, so that was super cool to see. And yeah, it was very interesting to see all the tools and all the libraries that you use to build this amazing project. I can say that using three GS is not an easy thing, especially if you don't come from JavaScript. It's a really tricky library to use. Specially. When you are new to all of it in the front end and using Macs with all the optimizations that they have and using it with Three JS crazy thing. Congrats on that. Yeah, I got to say it was very educational. I'm for sure going to check some list that you talked. And again, Tayo Win a great library as well for styling. I think it was a great stack that you picked and it was a great talk. Thank you very much. What is the stuff that you would like to do beyond what you have developed? Like, what would be the improvements that you would like to make? Yeah, I think the three JS stuff is the most exciting because it's actually just wrapping my head around that render loop and getting it to work correctly without kind of not rendering some frames or rendering too many frames than necessary, because if you're using the same data and rendering another frame, that's just wasting resources. But if you're not rendering quickly enough, then it's going to be Laggy. So I think I got the performance just okay, but I want to render something cooler. So like 3D cartoon characters. That would be quite cool. That's my main interest, but maybe like adding voice calling using WebRTC. That would be quite cool just to play with Web RTC. Do you have any other questions? I have one. But if you have one, I can ask mine after yours. I do have one. You said you plan on developing rooms and that kind of stuff. I assume you're going to be developing user registering and all that authentication. What do you plan to use for your database? Yeah. I didn't actually think about user registration at all. Actually, I was just thinking if a user just has like, if you just put a string in and you will find another room that shouldn't require any kind of user data. But if you want to kick people and all that stuff, then it does become slightly more complicated. But yeah, I think I'll definitely give that a Go Prisma because I haven't used it, but I've heard a lot about it, and I've seen on Hacking news as well. A lot of people are really interested in it. Are you up to it? Just hit me up. I would love to join the project. I was actually checking the GitHub and it sounds really interesting. We're with people and we do front end stuff. And a lot of times we have this React native. We have Ionic. You know where I'm heading up, right. I'm going to ask you the million dollars question. Would you use React Native and Ionic instead of using Flatter? Yeah, ionic is kind of, like, very outdated and didn't get as much popularity as React native. But React Native and Flutter is like a very hot topic. I think I would say there's a lot more hype with Flutter because we actually came first and the hype kind of went away because people started using it for real businesses and applications. But earlier this year, Google announced like Flutter announced like Flutter being used in Toyota cars and even for the Ubuntu installation screen. And it's going to be used more throughout Ubuntu. I'm going to just kind of go with the Hype for Flutter. And having said that, I find that the dark language itself is not as nice to use this TypeScript after using it for a while now it's like it's slightly, sometimes a bit annoying. But it's much better than can I ask you integrating from Flatter one point. The one person, one, two. Is it really a painful thing to do? If you flatter version with one version one, then going to version two migrate from that version from one to two. That's one thing that the Flutter team and the Dart team are really good at the migration tools. Okay. I haven't actually had to migrate any projects, so yeah, there's no issues. But. For the null safety feature, they also had a kind of a tool to help you migrate to the non null by default and making everything kind of. If you want to make something null, then you have to make it, like, optional. And that was not a pain. That was fine. I think the tools that you have been telling us is there something that in your eyes or in your knowledge it will work better using Flutter than using the web stuff that you spoke about. You mean for the app that I made? Yes. If you were going to create the same app. Sorry. Using Flutter. Yeah. Is there something that you think it's going to be better and it's not going to be so limited that it is in the Web? Yeah. I haven't actually considered building my app in Flutter, but definitely there are some libraries that I'm using now that I don't think exists in Flutter yet, and sometimes I'll find a library in Flutter. That kind of hasn't been taken care of, whereas in JavaScript, there are millions of installs and downloads for that library. Some libraries are just not there yet. And I'm not sure about the 3D stuff as well in Flutter. Yeah. Someone created a Port for three JS for Flutter, but it's not compatible with two. It's like published in 2013. Okay. Do you have any other questions? No. I think the only thing that I would like to point out is that I totally understand the feeling I started programming with Flutter a few years ago, and I remember I think it was before the Flutter web even went to all four beta, so everything was still very janky. There weren't any libraries. So most of the things that you wanted to do, you had to do by hand. But nowadays, especially after flood or two. I've been seeing the hype growing so much back when I was working with Flutter. People were always like, oh, why don't we migrate to React native? It's cooler, it's faster. There are more libraries, and I see their point. I see that you have the JavaScript ecosystem, but Flutter and the Flutter team have been doing some great jobs developing the framework, especially when it comes to all the integration. So you have the Firebase SDK everything. I'm not mistaken. It was already written into Dart, so it made the use of it so much easier. And all the leads that came up and the animation optimizations and the CLI and everything that you had the Flutter doctor and everything to help you develop. There are things that when I started tryin out React Native, we're not there. And I missed them so much. So I still believe Flutter have ways to go and so much places that it can get so much better. But I believe it already came so far. It can go so much far. It's good to know. Yeah, that's all my questions. I think there was one question. I think it's are the calls included on the fly. Sorry. What was the question? There was only one question by David B, and the question was, are the calls encrypted on the fly? The actual data is going over WebSocket, the secure WebSocket. So WSS. So it's encrypted using the normal TLS, as you would any visiting any website they're encrypted. Yeah. There you go, David. And that's for us for today. It was quite exciting. Meetup in August. So we'll see you in September. A bit more relaxed. And after them, I hope you still have time to have some summer holidays. And yeah, don't forget. We have Prisma Workshop at CTCs. Sign up. Maybe you can learn about Prisma workshops. Sign up. It's free. And the rest of you see you in the have a good one. Bye bye.