Video details

Profiling Angular Applications | Minko Gechev

Angular
12.10.2020
English

Minko Gechev

In this video, we're going to focus on the runtime performance of Angular applications. First, we'll learn how to profile an app using Chrome DevTools. After that, we'll identify different patterns looking into the profiler's output. For each one of them, we'll discuss strategies for improving the runtime performance and their consequence.
Resources: - https://www.youtube.com/watch?v=ybNj-id0kjY - Optimizing an Angular application - https://web.dev/rail/ - RAIL model - https://github.com/mgechev/optimizing-apps-angular-demo - App used for the demo in the video

Transcript

Hello, everyone, my name is Mikhail Gorbachev. I'm working on anger at Google today, I want to share with you a couple of insights on how you can optimize our applications, runtime performance. First, we're going to look into how to diagnose a few common performance problems by using chrome dev tools. I'll explain what frame charts are and how we can use them to find performance pitfalls as the next step. We're going to discuss how to optimize our apps and make them faster. Finally, we're going to look into the Jouster Virtual Machine runtime and explore how it could impact our app's performance. I've been doing a lot of work in this space over the past couple of years, often at events or on the Internet. Folks ask me, how can I make my application run faster? Well, the high level answer to this question is pretty simple. Just do less. This advice is valid not only in the context of aguer, but for any framework or programming language out there. To make our apps run faster, we should just do fewer things. At the end of 2018, I gave the talk, optimizing an anger application, where I explained several practices that can make Angor do less, improving our AFEs performance. Things haven't changed much over the past few years, and these practices are still valid. In fact, I would recommend you watching this video to get more value out of this current one. We can memorize calculations using pure pipes or storing the results out of calculations. We can skip change detection using on push or running cold outside of the eurozone. And clearly, we can render fewer components using virtual scrolling or pagination. In this video, we're going to classify common performance problems into several categories, learn how to recognize them using the chrome dev tools profiler, and apply these practices that we already know to speed up our apps. Let us first look into how we can profile an application. For the video I have built of this dashboard here, we have a few different charts, a widget showing an overall score for the data, we have a table and at the bottom, just a bunch of links to profile this app. We need to keep in mind the following three essential preconditions when during the project we need to ensure the client is using its production environment. Running a production build is required because otherwise the client will not remove that angular uses only during development to guard against common mistakes such as circular bindings, for example. Next, we need to make sure we're not mangling the output of the Seelie. This precondition is not as critical as the first one, but ensuring we have readable methods and property names will help us identify the cause of issues we find. Finally, we need to make sure we're providing the app without any browser extensions, enabled extensions can add extra noise to the profiler and even skew the results if they plug into the app's execution lifecycle. The easiest way to do this is to open an app in the incognito mode. All right, now let me prepare our Dalibor for the book, making sure it follows these three preconditions. We can make sure we disable mangling by setting the engine built mango environment variable to force. After that, we need to invoke anger built with those dash prods to build the app in the production environment. Look at this beautiful output from the Angasi lie here. We are exceeding the maximum bandeau budget here. That is because we are using strict Moort. So we have lower thresholds and we also disabled mangling. So our bundles will be larger because of that. This evening, mangling can negatively impact the profiler's output because the JavaScript virtual machine needs to pass more codes. But it shows ColorMatrix dramatic. Next, we can go to the ladies director and start a static file server. I really love using surf since it is aware of the Kleinsorge routing and when Starz a server, it's automatically puts the euro out of the app into the clipboard. Now to preview the app, we can open an incognito window and pays the euro in the address bar to profile the application, first goal to the performance app and after that, click on the record. But we can start interacting with the app to capture application usage scenarios in The Profiler. Whilst we are ready, we can stop the profiling and preview the flame chart. Here it is necessary to notice that chrome death toll shows us the estimated frame rate over time. See how where the rate is lower. There is a red line until death tolls follows the rail model. It indicates risks that the frame rate drops to a level that would not allow the U.S. to respond within 50 milliseconds to user interaction as a next step. Let us look at what flame graphs are and how can we read them. Here is an example of a flame graph, it visualizes the execution of a program over some time, each rectangle size is proportional to the number of times the corresponding coat ended up being part of the call stack during the profiler sampling. Brendan Grec, a performance engineer at Netflix, originally developed this visualization methods of profiler's output. All right. So now let us trace the execution of a program and sample it to preview it with Flame Graph to get a better understanding of this visualization. Here we have a few functions. A witch calls B and A one B, which does some work, and right after that calls D, D, which calls E and the function A1 and E. We first called the function E and right after what we call a one. At the beginning, we'll first call the function A. When the profiler takes a sample, it will find a in the course stack and records of this fact. After that, it will all be will have A and B onto the core stack in the next sample continuing. Who will get A, B and D and add the following sample? Do you invoke it once the execution completes, will get E, D and B out of the call stack and we're going to invoke A1 as an example. We will have completed a and the profiler will capture a one onto the call stack. The primary purpose of the Flame graphs is to capture how many samples a given function occurred then since this could potentially be in the multithreaded environment. The order of execution is not something that we can express accurately with just a single graph to improve the visualization, we can sort the samples in alphabetical order and merge the rectangles corresponding to a specific function call into one. We can see that we spent a decent amount of time in B, so there might be a place for optimization here. Well, enough about flameless now let us talk about fleeing charts, which is something different when the conductor's team worked on their profile, where they decided to reduce the Flame Gravies organization because they found it particularly useful, however, since their main focus was the main JavaScript threats. They changed the format a little bit to show also the execution over time. Let us look into the flame chart from the profiling we did just a few minutes ago, not as the calls from the ANGULAR runtime, for example, refresh component, refresh view, etc. At the bottom, we can find the execution of the components template function when we select this core, drag the bottom bar up and here we can see a link to a template function, exact location within the formatted source file. Clicking on it will take us directly to the right spot. Here we can find all the obstruction, rendering this template baking on the bottom up top. We can preview all the functions, the template function code and see how much time we spend in them, which corresponds to the number of samples, the profile or captured captured in. Now, let us use this knowledge to understand what triggers a change of direction and find redundant costs. Based on the many apps I've profiled, some of the most frequent redundant change detection triggers come from set time out, set interval and request animation frame. Often these calls are in third party libraries, so it is not immediately apparent that they occurred. Well, not at the bottom here, before we even get into the anger runtime, there is a rectangle that says events click. This event is what triggered the cycle of change. The action, the event maps directly to our click on the hamburger menu, toggling the side navigation screen down. We can see the detect changes call that we will later indirectly evoke the components Templar's functions. Zooming out, however, not, is that we have many similar change detection calls, many more of them, the quick witted. Zooming in, we can see a timer event judging based on the equal intervals, we don't change detection in here. This seems like a leaked set interval. If this behavior was not intended, we can just drop the application inside of danger zone, run outside, and you're just to remove redundant change detection calls and optimize our app. OK, well, as a next step, let us look into how we can detect long course long calls could be particularly harmful to our applications performance, especially if there are in templates or lifecycle hooks that anger invokes during change detection. Going back to the same chart, we can see that we have a GETER called aggregate at the bottom of one of the coasts cooking on the bottom up tap. We can find this piece of quartz exact location in the source that. To see if we're spending sufficient time in the aggregate together as part of the change of direction, we can just go back to the top of the same chart, click on any of the calls there, and just explore the bottom up top of the game here. We can see that we have spent over 50 percent of the execution time only in the aggregate Gitter. Well, that is a lot of time here. We have a couple of options in order to optimize the court. Clearly we can use memorisation. For example, since a coup occurs in the template, we can even use a pure pipe. All of these approaches are definitely valid at the same time. However, the core seems to be quite expensive. So even if we apply memorisation or pure pipes, we'll still have to perform the calculation at least once, which will hurt the initial performance and initial rendering of our app. What we could do instead is move the calculation into a Web worker, let us go to the terminal and just run MGG Web worker specifying the worker's name. Now open the worker file and let us replace its contents here. I'm using a snippet, but let me quickly go through the court. We need clear message listener, and in the callback we get a message I.D. and on the over which we're going to perform the calculation, we use I.D. just to ensure we return the result associated with a correct work or message at the bottom of the function. We just pulled the result back, associating it with the message we received earlier. To use the worker, I'm going to create a very simple service. This way we can quickly more kids and cash different calls. Here we first Eustachy, the worker after that at an event listener to process the response with the calculated result. And at the bottom, we send a message to the worker before that ensuring that there are no other pending calls. Finally, we can just update the get her to use the service, which communicates with the worker first, we're going to inject it into the constructor of the home component. After that, we'll invoke its calculated method, passing the required parameters. If we get the number, we're just going to return. Alternatively, we want to return the string calculating since. Well, this is on the synchronous calculation. Here we rely on the fact that anywhere will, of course, change detection when the Microsoft cue of the browser is empty. This way, the aggregate GETER will return to numeric value at the last change detection call. And we're going to just make sure that we have consistent state of the view this way. We can now bring you the results, not that we get the calculating label for a bit until it changes to the course result in just a few milliseconds. Let us know, look into the final pattern that we're going to describe in today's video. In this scenario, we have a really large componentry with many cheap calculations, for example, very simple templates and life-cycle hooks without any heavy calculations. Here is one such clean chart, we can see that there is still a frame drop that can impact the user experience, but most schools here are taking less than one millisecond. So what could we do when Angerer calls the change of direction? It will start from the parent component and check its children after that. It is also essential to notice that depending on the change, the action strategy components using on push could be cheaper than others. Having a parent component with many children using on push could be relatively cheap as soon as change in the children's inputs doesn't trigger a change of action. In contrast, however, if many children are using the default U.S. strategy, the execution could be much lower. All refactoring could be used here to improve the performance, you just creating a new parent component that uses on push and move as many of the components using the default change detection strategy as its children. This way, we're going to prevent change, U.S. running in entire components of trees and have faster execution since we're going to do less. However, keep in mind that this could bring improvements during change detection, but not necessary at initial rendering and rules to have to render all the components and the more components we have. Well, the slower the rendering would be. The way to fix this is to render your components virtual scrolling is a way to achieve this. If we have thousands of items in the list, virtual scrolling could help us render fewer components. Pagination is clearly another alternative. A more advanced strategy is implementing On-Demand rendering, depending on what is currently visible in the viewport. For the purpose, we can use the Intersection Observer API. Well, the chances are that you would be able to speed up your applications for runtime performance if you're following all the practices that you already mentioned, especially during your initial rendering. However, there are occasions when the Jospeh Virtual Machine runtime could bring some extra waste and make things more difficult. Instead of interpreting all the source code, which provides the JavaScript virtual machine, compiles it to native code to improve performance. This technique is known as Just-In-Time computation or gipped. Often it relies on assumptions about the source code. And when these assumptions turn out to be incorrect, the VM needs to optimize the source code. Well, we have optimized the internals of and you were well for such situations, but Gite on its own can bring extra cost during execution, especially for cold cold. That hasn't been compiled yet. Well, now let us visualize this in practice, to do that, we need to enable an experimental setting incrementals, go to the gear icon, select experiments and enable timeline V8 runtime course stats on timeline. Enabling the setting will require a restart of the. Now, when you go to performance and profile, yep, we're going to see something interesting, let us zoom in. In the first part of the timeline, when we magnify Furter, we're going to see many compile and pass Cool's in the flame chart. These are all places where JavaScript VM compiles caught during cold execution until it happens, some functions could take five or even 10 the time they will take while the jouster virtual machine compiles. We can see that when we move towards the end of the timeline. Notice how we have almost zero compile calls and all the functions are taking much shorter is one compile call later on because a Jospeh virtual machine performs Gite OnDemand. This function hasn't been called in the past, so we just need to compile it right here. Well, that was pretty much everything I have for today, I hope this presentation clarifies what's happening under the hood of your apps runtime and how you can diagnose difficult performance issues with three main parties identifying redundant detection triggers, detecting and optimizing expensive course, using Web workers and refactoring applications with large component hierarchies. In the end, we peek into Joska virtual machine runtime and somehow function calls could be way more expensive before a JavaScript virtual machine compiles them. Thank you very much for watching this video. See you next time. And happy cutting.