Video details

React Native EU 2021: Wojciech Kwiatek - Creating a VoIP app in React Native - the beginner's guide

React Native
10.12.2021
English

This talk was presented during the React Native EU 2021 - the largest community conference in the world focused exclusively on React Native.
Abstract: How to make your mobile... to call? Is it easy to create another Skype or Whatsapp?
This talk will walk you through the various aspects of creating "the calling app". Starting with what the VoIP actually is, going through some theory of WebRTC connections, and ending with the CallKit of iOS vs Android way of handling calls with the React Native. It will give you a sneak peek of what is required to make a production app that handles outgoing and incoming calls and what kind of infrastructure you need. The talk is based on the experience taken from the phone app made for sales and support teams around the world.
Wojciech is a JavaScript developer, trainer, and mentor. Started commercial programming with Node.js, then switched to front end world to work with mobile lastly. During the past few years went through many JS frameworks and companies building different pieces of software. For many years involved in JavaScript trainings by preparing materials and leading both open-source and commercial workshops. Now bridging the gap between tech and business.
Twitter: https://twitter.com/WojciechKwiatek Github: https://github.com/wkwiatek
Additional Links:
Link to slides: https://www.slideshare.net/WojciechKwiatek3/react-native-eu-2021-creating-a-voip-app-in-react-native-the-beginners-guide
Link to the main React Native packages mentioned in the talk: - React Native WebRTC: https://github.com/react-native-webrtc/react-native-webrtc - React Native CallKeep: https://github.com/react-native-webrtc/react-native-callkeep

Transcript

This conference is brought to you by Cold Stack, React and React Native development experts. Hello. Welcome, everyone. My name is today. I'll talk about creating your VoIP applications in React native. So I'll talk about basics. So we'll see how to start with the VoIP apps and what they are and what problems you can meet during the development or such kind of the app. But first things first. As I said, my name is Voytik Vadek. I'm a CTO at Challenge. So we are creating the product, which is a business phone app, so you can find it on app. I'm a JavaScript developer roughly ten years right now, and I'm also a trainer mentor when it comes to the field. So when it comes to technology that I was involved in, I actually did most of the it looks like I started with some Jquery Amber Nocow JS and going through Angular two, of course, the React and the React native recently. So I'm not on the coding right now. I'm also trying to gap between the tech and business. So I'm also that guy who's actually checking if the business ideas are really doable and what kind of technology can help the business to grow. Okay. So you can find myself on a GitHub or on a Twitter. So here are the links and yeah, let's start with the topic. So just as the beginning, what actually is VoIP app. So when we talk about invoke apps, you probably are thinking of something like Zoom Messenger Teams, what app and Skype, one of the main apps that are actually used to connect two peers together. So basically when you're calling someone either with the audio or video. So it's basically done via deploypat. So it doesn't necessarily have to be a phone number at the other end. So maybe just the connection between two apps. Right. So that's what most of them does. And they also are doing mobile apps. This is something that will try to see how to create such kind of app. So when it comes to some kind of ingredients, what do we need to create such an app? So I could hear three things, which I consider the main one. So the first is access to the hardware. So to the microphone or camera. So we need to get the permission from the user. We need to get the stream, the audio stream or the video stream from the user, and then we can process it later. So that's the very basic thing to somehow get stream. Then the second thing is we need a connection between these two endpoints by endpoints. I mean two users, two or more users, because it doesn't necessarily mean you have to be like we have just two users who may have a group call or something like this. So we need connection between the participants. So that's the second thing. The third thing which applies especially to the native apps, is that you need some way to communicate to the API that your call, your phone is calling. So what's about it? Some Hangouts or some. Or maybe if someone calls you, they are showing you or your phone is ringing. So that's the third thing. So we need to somehow connect to the phone API to basically show that the call is coming. So let's start with the first one. So we need a set of permissions. So the way I put it in as a spirited topic, because recently Android and iOS are trying to protect the users, and they're trying to give them visibility of what's going on with their devices. A few years ago, it was like, okay, on Android, you get the permission for everything, actually, and you could do whatever you want. But these times are gone. So now we need on both systems, we need to ask for permissions for specific permissions. We want to use microphone. The user has to give you the access. And why does this such a problem? First, because the guidelines, especially on iOS, requires you to have a working app, even without the permission. So even if we have a calling up and user don't give you an access to the microphone, it still needs to work. Somehow it may give some kind of information. That okay. You don't have access to the microphone. So please give it to us. So we will be able to make a call. But yes, still, it needs to be a working up somehow. The second thing is that there's something which is called the call screen on Android, and it's sometimes maybe different from device to device on how it looks like and how it behaves. So sometimes you may need to educate users a little bit more on how to give the access to that, because it may not be that simple. Like you can see on the screen that Adia would like to access your microphone and that's it. Of course, all of these permissions can be reversed. It can be basically taken from you for the given up. So bear in mind that you should always have a fallback on that, especially when to describe to the user what's going on. Okay. When we actually have a permissions, it will be pretty straightforward to get the audio stream. But to get the audio stream, we need some kind of solution to that. And thanks to the Webrpc, we can use it. So we'll talk about the WEBC first when it comes to the JavaScript, because WebRTC actually has a lot of ours right now. So it's like ten years. I guess so. Actually, the Web RTC is for a long time with us. However, this year came the WebRTC 1.0. So this is also a good step because previously, when using WebRTC, you may hit the wall when it comes to the interoperability. I mean, there could be a different implementation of the same thing, for example, Chrome and Firefox. So the different APIs could work in a different way after this 1.0 version. It should not be a problem at all. So you probably will not need any kind of adapters polyfills and things like that to make Web artistic work on every major browser. What's also important to note is that actually every single browser, even the software on iOS or Linux, will work, so you can use it between actually most of today's browsers. But let's talk a little bit on how do you connection look like? Because usually what you can find on Internet is just like Web RTC is a peer to peer connection, and that's actually true, of course, but there's a little bit more behind that, because to have a peer to peer connection, you need to somehow discover these two peers. And this thing that's discovering these two peers is usually done through some external server. This extended server are called stunter different technologies. To achieve the same thing. I will not go into the details when it comes to term. I'll attach the links at the variant so you can read more about it. However, the idea behind it is that the web artist supports both of them, and both of them are basically to enable the connection between two peers. So to make them talk to each other. However, before you basically send the media between two peers, you will need to set up a connection, and that's where this work is in. Okay, so we have here some high level implementation of the Web RTC connection. So first, we need to get and share local animal descriptions. So it means that we need to share the metadata about the local media in some specified format format. We don't have to care about it that much because that's part of the Web RTC thing. But what we need to know is that we need to send our metadata about what's our local stream on one device to another device. So for example, what's the resolution, what they support? We need to push that information to the other part. And the second thing is to make the peers discoverable. So it's about the service that we taught just before. So we need to get the information from the other peer and share the information about myself to the other peer. So it's called the IC candidates. So we need to get this candidate and send it to the other part, and we have these two things. We actually have the Web RTC connection. So it means we are allowed or we know how to send the data. So the last part is basically send the data which we get from the particular device for every single user. Okay, I'll try to share some code here, so this is not the working snippet. It's more like an idea of how it could look like. Of course, the APIs here are the real ones. However, what we can see here, we can see the configuration object, which has an URL to the Pub extend server. So it means you don't have to have your own server for that. You may need the Google. That's actually pretty safe, but probably later on will use your own app. You may need your own server, but for the example, take the Google ones and you're creating RTC peer connection with that, which means that this is the object that we will attach all their information. So for example, here you have put that later on. So this one you will have the creative offer from the peer connection. So we are creating the offer and we are selling this offer to our peer connection, and we are sending this through some sign. I met this one because there's a sign on channel, which is kind of mock when it comes to the word RTC implementation or actual description. The final link is not defined. It means this signaling channel can be anything. So for example, in this case, it's some kind of WebSocket thing which sends to the node server, and the note server manages this tiny link. So you have the send method to every other channel of the WebRTC and also on the message. So yeah, let's go. Here we are setting the local description. We are putting this offer into the sign on channel. Okay. So then we go to the receiver. We have the same thing when it comes to creating a pure connection, and we have the listener for the message with the offer. So we are taking this one on the receiver part and we are setting the remote description. And as we set the remote description, we are creating the answer and we are setting the local description with this answer, and we are sending the answer. So basically receiver gets the offer, set the offer as a remote subscription, creates the answer as a local description and puts the answer back and back to the sender. We have had even listener for the message with the answer where we are setting the remote description. So this was the first part about negotiating the resolutions, the collects, and so on. So the metadata about what can we send and what can we receive. The other part is that both sender and receiver are putting the listener to the It candidate. And this is what the stand server actually does. And as we get the IC candidate, we should send it to the other channel. So we are sending from the one peer to another, the Ice candidate. And as we receive it, we are adding the Ice candidate to the peer connection. And now after these two things, this means that we have the networking set up. So we are allowed to send the media. So as we have the connection state changed. Here the lead in this manner for that, you can see that the connection state changed to the connected. Then we're good to go. We can send them to you. Things are getting complicated. So you can see that we just took a look at at the very beginning of this. Let's see conversation. So we just set up the connection between the peers with the Web RTC. We didn't even send the media yet. To be honest, the whole world RTC is a very powerful tool, but it's definitely not that easy. And when it comes to the VoIP, there's something called session initiation protocol. So there's some very special protocol, and there are some tools built on top of that that allows that works also on the top of the Web RPC that allows us to do things a little bit simpler. What I mean by that the CM or actually protocol can set up a session between an invoice. What we already did was between two peers. The pure WebRTC. It can negotiate immediate part. Yes, one, one more time. Exactly. Say what we did. It can manage the session so it can terminate the session, can adjust the parameters during the session. You can substitute the truck. For example, if you want to transfer the call to someone else, or if you want to use a different track. So for example, you put your headphones on. So all this is already held on some tools. So when I think of some tools, there are some JS libraries for that. There is JCP and CJS. The CJS is actually a folk of JSC, and they're creating a kind of abstraction over the WebRTC. So you don't need to care about every single thing which you would do with the Web RTC. What's more, it actually takes care of the sign up. So what? It means that when we have this signaling server or whatever that was more than the example. Now all of this is being handled by the Sip server. That's also downside because you have to use a Sip server. But the upside is that it's already done. So there are some servers like Malio Preschool and more of them. Even so, I would just name some popular ones to take care of the signaling thing, to take care of this negotiation and so on. And so on. So would I say that if you want to use the cloud or want to create a cloud for the application, you probably will end up with the ship on the top of the world. I strongly encourage you if you want to give it a try, let's try. Let's start with the Pure Bartc and see where it goes. And then let's go to the CBN and see how it's different, how it's much more helpful. Okay, so the question is, if the silent is enough, I would say for the whole application, it's not yet everything, because what we have here what we discussed is just a part of sending the media. So even though we started with the web party with this initiation part, then with the signaling, it's still about the sending media. So you have a whole bunch of things that you would also want to send via the Rest API via GraphQL or even via another WebSocket. Bear in mind that all of this that we are talking about, the signaling thing is all about just the call, just the video or audio call. Okay. So we talked a little bit about the WebRTC, but Where's the React native? The case is that thanks to React native WebRTC library, all of this is also true for the React native. So you can use all this API also in the React native app. What's cool about this is that this package also takes care of polluting the global JavaScript namespace with Web RTC global. So, for example, if the library like JST uses some global IPS like Windows WebRTC something, it's able to use it as a normal JavaScript application. So this package is pretty cool. So it requires some configuration style as far as I remember, but after that, it's pretty smooth. So definitely should use that when it comes to the word team. Okay. So now that's the time we can write the app. And that's true, because after all the steps, we can now make a connection between two apps. I mean, between two instances of the app. So between the two users, we can send the voice, send the video, and all the technologies that we talked about after this time should be enough. But there's one problem when it comes to the mobile. What if the app goes back on? Because this is the problem that we don't really have on the web apps, because usually they are done in the way that you open your open tab, you are doing something, even you are calling, you are having a mid conversation, you're closing the tab. But when it comes to the mobile, you put the mobile phone in a pocket, and then, yeah, you actually wanted it to ring as it's required. So all the other apps are doing that. Right. So WhatsApp when someone is calling you on WhatsApp is basically ringing us any other phone. So this is something that we want to achieve also in our app. Right. So to achieve this thing, we would need two things. So the first one is the push notifications. Also, we have a slight difference here when it comes to the Android and iOS. So for iOS, we have two kind of notifications. So we have push notifications and VoIP push notifications. The VoIP ones are meant to be used only for signing the call. So, for example, let's go back to the example. If someone calls you on WhatsApp the WhatsApp uses by push notification to notify your device, that the call. There's an incoming call on Android. On the other side. There are just notifications, and you are allowed basically to create such a cold screen from any notification. Why is this important? Because on iOS, you definitely should use the VoIP, because the priority of such notification is much higher. And starting with iOS 13, it's actually strictly bound to the call screen. So let's talk about the call screen. So with iOS ten in 2016, Apple introduced something which is called call kit. The call kit is a set of APIs which allows you to communicate to your phone call system by call system. I mean, you can basically show call screen with your data, your apps data, and that's what main competitors or main companies are actually using right now. It was even more prominent starting with iOS 13, because starting from iOS 13, they actually forced you to use VoIP notifications with the call screen together. So every time the void notification comes in to the iOS app, it needs to be reported as a call. So the call stream is being popped up. So the only way to use it, you should follow the rule. You should use the VoIP push notifications only for incoming calls and for the other. You should use the regular percentifications of the iOS. When it comes to downroad, you have the priorities. So we are managing this with priorities of notifications, and one of them are simply showing the call screen while others are becoming an order notifications. Okay, but when it comes to the call kit, there's a library which is called Reactive Call keep, and this call keep manages both call keep it on iOS and connection services on Android, which means that we can show the costing on both Android and iOS using this package. This is the package that probably the only one on the market that will handle that without touching the native stuff in here. So I strongly encourage you to take a look at that when it comes to handling the call kit and the connection services on Android. So this is how it looks like when it comes to the incoming call on Android. So we have basically our own screen, the notification, which is the income call notification strictly from Android. In this case, we are answering the call and that's the call screen of Android. So that's what you can achieve using this call keep REPL. Okay, so I bet now we covered all of the pieces that are needed to create such an app. Of course, this is a very sneaky overview of what's going on here, because there's a lot more underneath that. But I hope you get some very basic knowledge on where to start and how these apps are actually looking under the code. So just quick recap the first thing. Think of permissions, because usually there will be more of that than just popping out the permission you need to handle what's going on. One user denies permission. Basically. Second thing, I bet the only or actually prefer way it might be. The only way to do is the WebRTC. So you need to incorporate WebRTC into their native app, and you may not use the Sip and the Sip server. However, it depends on your use case, so you may need some simple things when the WEBC the pure web artist will be enough, but you need the whole signaling thing, the whole negotiation thing covered. Then the ship is the way to go. The thing about the calls, even connection services is very important when it comes to the background services, especially the incoming calls. So if you want to do the incoming calls properly, then you need to use the call kit on iOS and voice push notifications on Android. There are push notifications, just casual ones, and you have the connection services which allows you to show the incoming call or outgoing call using the native screen. And there's a package which is called React native call. Keep to handle both of them. Last but not least as you're starting to work on some kind of web application. See if you really need to create it from the scratch, because there are some tools on the market. There are some SDKs on the market like Twilio. We are also entering the train of challenge that app. So maybe some part of this VoIP applications are already done. Even if it's paid, it might be even cheaper to create some sort of VoIP application using such already built solutions. So thank you very much and also attach all the links to some additional resources. Yeah. See you hopefully next year in person. Bye.