Create Smart Angular Apps Using Azure Computer Vision - Ankit Sharma
We will create an optical character recognition (OCR) application using Angular and the Azure Computer Vision Cognitive Service. Computer Vision is an AI service that analyses content in images. We will use the OCR feature of Computer Vision to detect the printed text in an image. The application will extract the text from the image and detects the language of the text. This app will support 25 different languages.
-------------
Angularday 2021 happened online on 12th November 2021.
Info and details about this edition’s schedule: https://2021.angularday.it/talks_speakers/
Next edition: https://www.angularday.it Twitter: https://twitter.com/angularday
-------------
Keep in touch with GrUSP! • Upcoming events: http://grusp.org/en/ • Subscribe to our newsletter: http://grusp.org/en/nl • LinkedIn: https://www.linkedin.com/company/grusp • Instagram: https://www.instagram.com/grusp_/ • Twitter: https://www.twitter.com/grusp
Now up, we have more amazing talks ready for you. So the next step will be a talk debt. Very interesting title. Create Smart Angular Apps using the Azure Computer Vision by Ankit Sharma. Please, everyone welcome Ankit. Hey, Ankit. Welcome. How are you doing? Hi, Thomas. I'm doing good. How are you doing? Also good. Very curious about your talk, actually. But please, I kind of forgot. It's kind of my fault. So let me just take a step back and introduce you again. So Ankit is a software development engineer at Cisco, google Developer Expert for Angular, microsoft MVP, out speaker and a passionate programmer. I would say that's like very impressive bio. How do you get the type to do all these things? Actually, I find some time mostly on weekends and holidays. And actually I love to do the community work. And I always believe that as a tech enthusiast, it is our responsibility to share our knowledge with others. So I am privileged that I got a good job at a good company, and I am exposed to enterprise level application. But everybody does not have that privilege. So whatever I am learning, I want to share with everyone so that they can also learn from my experience, and they can also grow in the development world or the It world. That is my main motive. And regarding the time, I mostly do on weekends, whenever I find time. So that is how I plan the things. Sounds amazing. And I do believe that this kind of sharing can inspire other developers to share even more so that we can all level up and deliver even better stuff for our users. So please, let's not delay any further, and let's hop straight into the talk to learn more about the Angular apps and the computer Vision. Here you go. Yeah. Let me share my screen. All right. Let me know once you see my screen. So, Julia, does this work perfect? It's coming perfect. So let's just start. So am I good to go? Yes, please. Yeah. Thank you, Thomas. Hello, everyone. My name is Ankit, and today we are going to learn how to create a smart Angular application using Azure Computer Vision. So let me introduce myself. I am a software development engineer three at Cisco Systems India. I am an author and a speaker. I am a Google Developer Expert for Angular and a Microsoft MVP for Developer technology. I also blog at my personal blog, Ankishamablocks.com. I write frequently about the net technology. On the UI side, I write about Angular and related JavaScript frameworks. So let us understand first what is Computer Vision? Computer Vision is an AI service which analyzes the content in images. It is very easy to integrate with our application with the help of a simple rest API call. A Computer Vision service can run on cloud as well as on premise within the container, such as Dockers. The Computer Vision provides us with three services or three APIs. As we can see. One is called Optical Character Recognition in Sort OCR. Then we have an Image Analysis API, and then we have Statute Analysis API. Let us understand these APIs in detail. So the OCR API or the OCR service is used to extract text from an images. It uses deep learning based models and works with text on a variety of surfaces and background. So Microsoft has specified some of the image requirements which can be used with the OCR API. The format of the image allowed is JPEG, PNG, BMP, PDF or TIFF only. We cannot use image of any other type. The dimension of the image should be between 50 and 50 to 10,000 into 10,000 pixels, and the file size must be less than 50 megabytes. For the Image Analysis API, this is a service which is used to extract visual features from the images. It can determine whether an image contains adult content, find a specific branch or object, or also find human faces over there. The image requirement for the Image Analysis API is that the format should be JPEG, PNG, gift or BMP. The dimension must be greater than 15 to 50 pixels and the file size must be less than four MB. So this is very much important to note that if we exceed the file size, we'll get an API error because we cannot transfer the file size more than four MBS to our email analysis API. Finally, we have the Especial Analysis Service. The Especial Analysis Service is used to ingest streaming videos from cameras, extract insights, and generate events to be used by other system. This service detects the presence and movement of people in a video. So a particular analysis is mostly used to analyze a video. We can use this to detect the presence of face mask on the face of people. We can use this API to detect the videos which is streamed via a CCTV footage. And we can use this in a store or maybe some kind of shops where we can analyze that the people who are entering the shops are using face mask or not. So these kind of features we can use with a statue analysis. So to use as your cognitive service APIs, we are going to create a net back end, and on the UI side we will be using Angular. So the architecture of the application is that on the back end using Net, we are going to make an API call to our Azure computer vision and from the front end that is Angular. We are going to upload a file or I will say upload an image. The image will have some text written over there. We will send that image to our net back end. Then the.net backend will convert those image to bytes and send to our computer vision, which is hosted on the cloud as your cloud, and then it will analyze the content of the image and send us back the text which is present in the image. We will then format the text. We will learn how to format the text in some after few minutes. And then we will send that text back to our angular front end, which will be then displayed on the UI. So in short, we are going to upload an image, having some text on it. Our application will extract the text as well as show us the language of that text for that. Since we are using a net back end, therefore we have to use Visual Studio 2019. And while installing Visual Studio 2019, we have to use ASP net and web development workload. We also need an ASIO subscription account. So we can create a free Azure subscription account at Azure Microsoft.com. And it's a kind of freemium service. So some part of the services will be free and some part will be paid. So for the first month, you can get a completely free subscription where you can use all the services up to $300. We are also going to install a NuGet package that is called Microsoft AZO Cognitiveservice Vision. Computer Vision. This library provides us the access to the Microsoft Cognitive Service Computer Vision APIs. So we can install this using the command which is shown over there. And you can also go to the nugget library and you can search for the package and download it from there. So let us see the demo. So let me go to my website which is Azure Service. So here I am at the Azure Portal. So, first step is to create an API service. So I am going to search for Computer Vision. So here you can see we are getting the computer vision. Click on that. Now we will click on Create. Okay, it is taking a few minutes to load. Now here it is saying Create Computer Vision. Now here we have to fill out a few details. So first of all, we are going to select the subscription name. Then we have to select a resource group. So I will be using an existing resource group that is called Cog Services, which I have created already. If you do not have a resource group, you can click on this Create new link to create a new resource group. Then in the select the region which is nearest to you. For me, I will be selecting Central India. Then in the name you have to give a universally unique name. So I will use Angular Day demo. Let's hope this name is available. Yeah, you get a green check. This means the name is available. Then pricing Tires. So there are different pricing tires. So I will be selecting the free tire. For this demo. You can select the pricing tire as per your requirement. And you can use this particular link to see the details of the pricing tire. Then click on these buttons. All Network. Everything is good. Let's keep all these things as it is, there's no need to do any other changes. Finally, click on the Go to Review plus Create tab and just review everything which you can see over here. So you can see that all subscription, name, reason, pricing, everything is good. Click on the Create button. So once we click on the Create button, the deployment of the computer service will start. It will take a few minutes to complete. Let's wait for a few minutes. And you can see this progress over here in the notification bar. Over here it's in deployment in progress. So it will take a minute or so, not more than that. Okay, so deployment is complete. We'll click on Go to resource button. So we will see the details of our computer vision API service. Here you can see the name which you have created, angular demo. Now on the left menu there is a link called keys and endpoint. Click on that and in this page you can see it's showing keys and the end point as well. So for now, I will leave this tab open. And when we'll go to the coding part, we will be needing this key and endpoint details. So we'll be copying at that moment. So let me navigate to the JSON structure of this, to the response. Let us understand this. So when we are sending an image as your computer vision and when it is sending a response back, it is responding us in a form of JSON. And this is a structure of Jason. We have to understand this JSON structure so that we can manipulate the data as per hour requirement. So let us understand this Jason structure. So it is sending us the language. So language is sent as a code. En stands for English, then text angle which shows us if the text is rotated or not. Then orientation which shows that the orientation of the image is what up or down and then reasons. So how it works is that it will send you the reason and the bounding box of all the words which is shown in the image along with the coordinates. So this is the coordinates. Now here you can observe that we have an object call lines. Inside lines we have an object called words and inside words which is an array. We have all the words. So for example, this is one word called once, then you replace negative. Now this particular line is ended again, next line. So similarly for every line it will give us the details of the words and their coordinate. Now, how we are going to manipulate the data, first of all, we want to show the language. Now language is shown as en, but we cannot show en or the code to our end user. We have to show the full name that is English or French or whatever the language the text is from. So for that we have to do the manipulation second manipulation which we have to do is about the text. Now we are getting the text which is divided in words, but we cannot show this to the user. We have to show the text as it is which is mentioned in the image, having line breaks at the same point. For that we have to iterate over this JSON. And then we have to create the sentence structure in our code base. And then we have to display on the UI. So these two manipulation, which we have to do after we get this jason from our Azure computer. So we'll see how we will do that. So, let me move to my back end code. So, this is a standard.net application. Currently I'm using net five because net six has been released recently, I think a few days back. And I didn't get a chance to update the code and that's why it is net five. But the working will be same in net six. Also there is a code functionality. So here we have created two models which is for available language and there is for language details. So in the language details we have name, native name and directory. So this model is created so that we can bind the data to the JSON. Then we have DTO models, which is Data Transfer object model. We will use this model to transfer the data to our UI. And our point of discussion is this controller. So one more thing, I forget to say that in this application, the client app folder contains our UI code, that is the angular code. So this whole client app folder has the UI code. Other part is our back end code. So let us understand the controllers part. So the controller will have our API controller. And here you can see that we have created a constructor and I'm providing the subscription key and endpoint. Now, from where we will get that. So we'll get that from our SEO. So here you can see there are two keys shown. So I will copy one key. You can use any of the two keys available. So there are two keys shown over there. You can use any one of them and then endpoint. So this endpoint is already created. Copy that. And then again we'll put it over here. So we have initialized the value. It is very much important to note that you have to put the correct subscription key and correct endpoint, otherwise your API call will fail. So let us go to bottom of this and let me explain this. Get available language method. So I discussed that we are getting JSON data and in the JSON data we are getting the language code. But we cannot show the language code. We have to show the language name. But to show the language name corresponding to a language code, we need a dictionary or key value pair which has all the language code and their corresponding language name. So from where we can get that. So therefore as your computer vision does not have any API which provides us the key value pair for the language code so we are using the translation Microsoft Translator API endpoint. So Microsoft provides a lot of cognitive services API. Microsoft Translator is one of them which allows us to translate text from one language to another. So we are using the endpoint of Microsoft Translator which is written over there to fetch the list of a language key value pair. So here we are creating Http client endpoint and we are sending a sync request and then we are getting the data in form of JSON. We are desirizing it and then here we are using it for each loop and we are creating an object of type available language Tto which contains the language ID and the language name. So this is how we are creating our dictionary which contains the key value pair for our language name and their corresponding keys and then we will be returning over there. Next we have an authenticate method. Authenticate method will take our endpoint and the key and it will be used to authenticate our computer vision client using the endpoint and the key value which we are passing. And here in the post request these are post request. So we'll be hitting this post API from our UI and inside this you can see that we are first checking if the image data is available or not. If the image data is available we are calling Read text from stream method. Let's see what is happening inside that method. In the read text from stream method we are passing image data as a type stream and inside that first of all we are authenticating our client using the endpoint and the subscription key which we have defined. And then we are calling this method called recognized printed text in stream async. So this is the API method which is provided to us by Azure computer vision which will recognize the printed text in stream as the name suggests and it is an async method. Now it is taking two parameters, one is called True. This is a boolean parameter. So the first boolean parameter is used to check the orientation of the image. So if you remember in our JSON we are also checking the orientation, it is up or down. So we are using this we are passing through to check the orientation and then we are passing our stream of image data. It is returning as a result in form of object of type OCR result and we are returning that result from this method. Now we got this OCR result object. Now this OCR result class is provided from the nugget package that we have installed and that nugget package provides us this already available classes which we can use for the data binding. Now this logic the if else part is our second manipulation which I discussed to reconstruct the sentence. So here you can see if the count is greater than zero, which means the data is available, we will trade over the lines, then we will trade over the words. And then we'll use a string builder to iterate the text. And once that word is finished, since we have to provide a space between the words, so we are appending an empty space also. So let's see over the JSON. Inside the JSON we have the word object which is an array. Inside that we have multiple words. So we will be iterating over there as a part of inner loop. And from the outer loop we will be iterating over lines which is also an array. So that is how our core logic of sentence formation is working. Now, if the count is not greater than zero, which means there is empty data, there is no text in the image, we'll just see that this language is not supported or we'll just return an error. Finally, we'll send in the detected text and the language as bill. So you can see this language is returned over here, which is a key. So we'll be sending key from here and using another API which is this get available language API. We are sending the key value pair dictionary now on the UI we will be using key from this and we will look up into that dictionary to fit the value. And then we have done some exception handling. And if there is any error, we'll send the error message and we'll send the language to unknown. And that is how and then we'll return the OCR result detail to our colleague method. So that is how our back end will work, like our API part. Now, let us move to the UI part, the angular part, to understand how the UI is configured. So, just to reiterate the things, the UI code is available in the client app folder. And for the simplicity and ease of use, I have opened that in the Visual Studio code which is separate IDE. Here we have created two models which is available language model. Another one is OCR result for data binding. It is same as our DTO method which we have created on the back end. So it is accepting language ID and name and accepting the language detected text. We also have a computer vision service and this service is making API calls to our backend API. And here we have defined the base URL which is API OCR. And then we'll be calling our base URL to get available language. And then we'll call get text from image. And we are passing the image data over here, which is a kind of form data which we are getting. We will be calling these both methods from our component. So let's move to the component part. So this is our component. Let us understand this. Now, inside the constructor we have in a slice of values. First is the default status which shows that the maximum size allowed for images four MB. So we are restricting a four MB upload limit and that's why we have defined a variable called max size file size and it is showing us the number of bytes allowed. Now inside the Ng on in it here we are fetching the list of available languages. So this will be calling the Get Available Language method of our service which is over here and this will invoke our Get Available Language endpoint since this is http get and we are invoking it from here and it will fetch us the list of language and key value pair of the language and their corresponding code. So we are fetching this from here. Now we have an upload image method. So here we will be checking. Here we have a bunch of e false statements. So first of all we will check if the image file which is uploaded, if the size is more than the allowed file size which is the max file size, we'll set the status and will set the valid file to false and will return similarly if the image file type is not equal to image, for example if the user has uploaded a text file, in that case also we are not allowing it, we are allowing only the image files. That is why we'll have a check for that as well. And we'll set the status to please upload a valid email file and we'll set the is valid file to false. If all these checks passed, we will finally go for the positive case scenario. We open a file reader, will read the file as a data URL and then we'll set the image preview also which will be shown on the UI and then we'll show the image on the UI. So as the user uploads the image, it will also see the image preview. So you will understand this in a bit more clarity once we show the demo. Now this is the get text method. This get text method will be called once that image upload is successful. So inside that we will get the image data. We are appending the image data as a file image file and then we are calling our service Get Text from Image. The method Get Text from Image from our computer vision service and we'll pass the image data. Now inside this function call, you can see that it is of type form data and it is doing a post call to our backend API. And in the post method you can see we have this post method. So we'll be invoking this API. Now we'll get the data, we'll get a response and in the response here you can see let me break the line for clarity. Here you can see that we'll get available language detail and in the available Language detail we'll find the language in our API, available Language array and if the language is available, we'll show the detected text language as the language name. So this is the place where we are checking the language name corresponding to the key value pair which you got in the available language details function. Now, if the language is not available, this means the language is not supported and the back end has returned unknown. In that case we'll show the detected text to unknown. And this is the loader which will be showing. Finally we have Angie on this, try to clear all the subscription. Let's go to the HTML. So in the HTML, HTML is very simple and straightforward. So here we have an input and on the input as we change, we'll call the upload image method which will check for the image size and the image type. And if it's a valid image, it will show us the preview which will be shown over here using the image preview. And if everything is good, we've a button on click will call the get text method which will send the image to back end and fit the results. Now let us run this application and see it in action. And then again we'll walk through the code to have a better understanding. So we'll go to you and click on the run button to run it. So let's wait for it to run and we'll see. Okay, so build is successful. It's taking a minute to load. So when the application is launched, there's a console open. Here you can see the output from our application. So if any error happens while loading, it will be shown in the console. Okay, it is taking a minute. Let's give it some more time. Let me close this and try to run it again. Let's try to run it again. Sometimes it happens, but it will work. Okay, it is taking a bit longer time, but let's wait. It should work. This is unfortunate, but it should work. I don't know what is loading it's generating Browser Application Bundle okay, let's wait for a minute. It is saying timeout exception. Okay, let me try again. There is some issue, just give me a moment. It should work. I don't know why it is giving exception here. Hey Ankit, I was just thinking maybe we could try to debug this. Is it possible to start the back end and the front end in the separate consoles? No, actually this issue is coming from the back end. It has to launch. Sometimes there is a bit delay. So that is fine, we can continue. All right, perfect. Yeah. The application is launched and you can see on the NAV bar here we have a computer vision link. We'll click on this and we will be navigated to our UI. So this is how our UI looks. So on the left side we have a text box. So the output will be shown over here. We'll show the detected text over here and this is the place where we upload the image. Now here you can see we are showing the default status which is maximum size allowed for images for MB just click on Choose File. We will upload this file. Now here you can see it will show the uploaded image also. So I am able to see a preview. Now click on stack Text and let's hope we'll see the result. So now here you can see let's analyze the output. First of all we are getting the result here and the detected text English. So this is working fine. That is one thing. Second thing, if you look at the text, this is a quote and this is the author name. So the quote is in all upper case and the author name is in normal case. The output is also in the same manner that quote is in the upper case and the author name is in the normal case. Now here you can see we have all the line breaks at the correct point as shown in the image. This is happening because we have done our checks over here we have created the line from the words in our code logic. That is how we are getting the data. And on the UI let us debug the code a bit on the UI here we are getting the result. And then here we are checking the available language from our dictionary or the lookup table. And if the language name is available, detected text language is available. We are showing the language name which is detected text language. And this is then shown on our UI over here. And this is shown in the UI over here, which is English. So the back end is returning me en. However, using the lookup table we are corresponding value is English for en which is shown over here. So this is one positive case scenario. Let us check another positive case scenario which is a French language. So this is a French language text. Click on extract text and we get the output. Here you can see the text is shown as it is and we have the detected language as French. So these are the two positive case scenarios. Now let us check few negative case scenarios. Now I will upload a high resolution image. This image is of nine megabytes but we can see it is not allowing us to upload and we are not able to see the preview. And we are getting a message. The file size is more than the allowed limit. So that is why we are not allowing more than four MB. Next is Hindi language. So Hindi is my mother tongue. But unfortunately this is not supported by Azure CV. So let's see what happens if you upload a text which is not supported. Now we are getting a message that this language is not supported. And since the language is not supported, the detected language is unknown. So how this is working. So this is working from here. If the count is not good at zero, which means the API has not returned anything, which means that the text is not detected. That's why we are showing the language is not supported. And on the UI we are setting the language to unknown over here because the language is not available in our response. And finally we'll see an image which has no text. So you can see there is no text on the image. If I click on extract text, we'll see the exact same output. The language is not supported and unknown because our API is not able to find if there is a text here, since there is no text available. So we are unable to determine the language or the content. Now, there is one more check is if I am uploading a text file, this is a text file. If I upload this, you can see that we are getting an error as please upload a valid image file. So that is how we have verified all the possible scenarios, and that is how we have made our angular application smart. Using as your computer vision, we have uploaded an image with a text on it and we have sent that image as a stream to our Azure computer vision. It has extracted the text from that image and sent us a response. We manipulated the JSON response and then show the output on our UI to our end user. Now, before I finish my talk, I want to share a few resources for this talk. So if you want to see the source code of this application, you can go to this bitly link bitly Smart angular app. And you can also go to my blog ankisharma Blocks, where you can learn more about net and other Azure computer vision also. And I also write on Angler, so you can learn a lot about angular technology also. So that is all from my end. Thank you all for joining. Hi Hank. Thanks for a very cool talk. So first of all, before we jump into Q and A, folks, do not hesitate and please ask questions in the Q and A area so we can forward them to the Ankit. And we already have the first question what happens if it finds multiple languages in a single picture? Okay, so we have APIs for that as well. So the API which I have shown currently does not support multiple language, but there is an OCR API which support multiple language in the same text file, same image file. So we have to use that. So, as I showed that there are multiple services provided by computer vision. So if you are expecting that your customer is uploading an image with multiple languages, then you can use that separate API which is the OCR API. Sounds very good. So hopefully, Monica, this answers your question. Up next, we have a question for the Akomo can we use the computer vision APIs directly from the angular application without the need to have a back end to rely on. So this is actually something which I also wanted to ask. So thank you very much. Let's hear the answer. I will tell you the answer for that. So I will show you the quote. I will explain to you the answer. So this is the one reason I don't want to keep this in UI. I know there are ways where you can use keyword, some vaults or maybe store this image in some kind of file. But I don't want to put my subscription key and end point on the client side for safety reasons. That is one thing. Second thing is here you can see we are doing a lot of manipulation and we got Newton software JSON APIs using which we are easily desilizing and JSON data which we are getting. So these are the two reasons I prefer a net back end since there is some complexity involved using Visualization and we are also using some loops to do some kind of manipulation and we are also having some secret data which we don't want to share with everyone. That is why I always prefer a back end. However, you can also use front end. There will not be any problem, but the effort will be more as compared to using a back end. Thank you very much for the answer. So I hope that answers it. Now we have a question from Macimo. Is this all doable on other platforms other than Windows like Mac or Linux, installing packages using BS and the sort and this may be can be also extended like if we can use different kind of back ends. If there is some official support with the libraries for like a node or Java or whatever have you. Yes, you are absolutely correct.net. Is a cross platform framework. We can use dotnet on Mac as well. For Mac there is a separate idea called Visual Studio. For Mac you can install it and you can use it. And Visual Studio code is an optional ID. You can directly open this in client app folder source all your component is available over here. In the same idea I just showed Visual Studio code for simplicity purpose. There is no actual use of Visual Studio code. You can use directly everything inside Visual Studio itself. And for Mac and Linux. For Mac we have Visual Studio format ID. For Lennox we can use net CLI. So you can go to the official documentation of.net and you can see how to use net in Linux. Also, the bottom line is that.net is a cross platform framework. You can write code on any platform, run it on any platform. I think Thomas, you also asked some extra question later to this that is. Only yes, the other part of that was like if it's possible to also implement back end using different technology than.net if there is some official support for library, for note, for Java and stuff like that. Yeah, we have support for a few back end. Apart from net you can also use Python, node also I don't remember all the availability but there are a few languages available. Python and Node I know so you can use them as a back end as well. Sounds great. So I hope this answer your question with you. And up next we have a really great question by Rado. One, the text is recognized, but what if the language is not recognized? So an example of that could be that we are uploading a picture of a car and there is like a license plate which has letters and the numbers but it's not any kind of real language, right? So what will happen in these kind of cases? Okay, so if the language is not recognized in that case it will show the language as unknown as per hour configuration. The configuration which we have done, it will show languages because the number has no language, number is languageless. But this is a very good scenario and unfortunately I do not have an example to show that. But maybe you can get this source code and you can try out that and we'll see what it returns in this language field over here. I also need to check, to be honest that what will happen if I upload only digit or number and we need to see what is the language. I think by default it will show English if the text is recognized and if the language is not recognized then by default it will show English. That is what I believe but I am not 100% sure at this point. But we could say that it's safe to assume that the text will be extracted. So we'll get the response with extracted text from the image but maybe the language ID will be defaulting to English or maybe it will be unknown but we will still get our text so we can use it. We can definitely see the text because you are uploading a number, it can be recognized and we will definitely see that. That sounds great. So I hope this answers your question. Another one, and one thing I would like to ask personally is I really like that English quote about the changing of the thoughts but I do not speak English and so I had no idea. So I would like to know what was in that other quote in English or Hindi, right? It was the same quote just in the Hindi or was it like no. There is some different quote, let me open that I don't remember as well. So it is saying that let me see it is saying that you have to believe in yourself then only you will get your desired result. That sounds amazing. Self belief will lead you to your desired result. Oh, that sounds amazing, and it really fits to the overall topic of this talk, like achieving stuff and making world a better place. Yeah. This is the English one, right? Cool. Please show us. Right, exactly. Yeah, you cannot go wrong with that one, for sure. So thank you very much. Ankit was very interesting talk. We learned a lot, and hopefully we can see you at some future editions of Angular Day again. So have a great day and see you next time. Ciao. Thank you, Thomas. Thank you for your time. Thank you, everyone. Thank you. Bye.