Video details

"Of JavaScript Ahead-Of-Time Compilation Performance" by Manuel Serrano (Strange Loop 2022)

JavaScript
10.24.2022
English

JavaScript is particularly difficult to implement efficiently because most of its expressions have all sorts of different meanings that involve all sorts of different executions that are not distinguished by any syntactic or type annotation. The common intuition is that only JIT compilers can handle it efficiently because they can rely on heuristic-based strategies that require having the program and the data on hand. But static (AOT) compilers can speculate and use heuristics too! To demonstrate that, we have designed Hopc, an AOT compiler for JavaScript, based on the following assumption:
The most likely data structure a program will use is the one for which the compiler is able to produce its best code.
Thus, contrary to most AOT compilers, Hopc does not rely on complex static analyses to optimize programs. It simply tries to generate its best code that it protects with guards. In this talk, we will present its main optimizations and an in-depth performance analysis.
Manuel Serrano Inria, Senior Researcher
Manuel Serrano is a researcher of INRIA, the french research institute in computer science. During all his career, he has been committed to implementing and releasing open source software applications. First, he has created the Bigloo compiler, an optimizing compiler for the Scheme programming language that is still widely in use. Since a decade, he has been focusing on creating new programming languages for the web. He first started with Hop, an extension of Scheme that enabled programmers to implement Web applications using that functionnal programming language. Hop, was one of the first project to consider JavaScript has a target of another high level programming language. Since, about two years, he allocates all his research resources to designing and developing Hop.js, a multitier extension of JavaScript. Manuel Serrano also contributes to open-source projects. In particular, he has created and maintained for the years, Flyspell.el, the Emacs on-the-fly spell checker.
------ Sponsored by: ------
Stream is the # 1 Chat API for custom messaging apps. Activate your free 30-day trial to explore Stream Chat. https://gstrm.io/tsl

Transcript

My name is Manuel Terrano. As you can probably guess from my accent, I'm coming from France where I'm a researcher at Inria, which is a French research institute in computer science. So since a couple of years I've been working on JavaScript and in particular to implement and design a static compiler for this language. And this is what I will present today. So I will start slowly, not assuming that you know all the details and all the subtleties that come with this language. So we'll try to make a smooth introduction and then I will present some of the results we have obtained. So JavaScript and you see the title a Nightmare for compilers. And I will try to show you why. Most of the things I will present in this introduction are well known. But still I may assume that some of you are not so familiar with all these supporters. So let me start with this example, and for a while let's assume that we are a C or a Java programmer. Even if you are not exactly familiar with all technique of compilers are deploying and all the optimization, you may have a rather precise idea of the code you will get. If you will try to compile this with, let's say GCC two. Probably you know that you will get something like that because you know that an I will be registered. You know that the loop will just be incrementing a register, comparing something, reading from a configuration memory area and just jumping for the loop. So far so good. So now let's assume that we are no longer a C programmer, but let's assume that we are a JavaScript programmer. The story looks like the same because the syntax is the same. But what is going on actually? Well, it all depends because if A is an array, as before, the semantics of the program is exactly the same as the semantics of a Java program. But the problem with JavaScript is that A might be something else, and in particular A might be an array. But you can remove properties from an array and you can store property in the prototype chain of that array. And then when you are executing the loop, when you reach A of one, you no longer find the value in the area itself, but you have to follow the prototype chip. So you see, in the CCODE you are just reading from a contiguous memory area in this code. It's no longer that same story because you have to follow the prototype chain. So the code that you will generate will obviously be much more complicated. But things can be even worse because A might be something else than an array. A might be an object. And this is perfectly fine as long as this object has property named after the integer we are using for the array. And this is correct because the JavaScript semantics tells you that when you are accessing something from an object, you first convert the property into a string and you make a lookup in something like a hash table. So this program is perfectly fine. And you see, I have pay attention not to have contiguous object properties here because I go from 10 and two. So really forget about this idea of reading from a contiguous memory area. It will be a totally different matter. But I can do even worse because if I change slightly my program and if I replace I plus plus with I plus equal one, you know, in JavaScript I plus plus and I plus equal one are not the same. If I do that, I might even be something else and an integer, it might be a string. And then in that context I will concatenate things in my string and making an index from that value. So you see the idea we had with C where it was pretty obvious that the implementation of a loop was just a bunch of assembly instructions. Clearly with JavaScript forget about that. It will be a totally different story, totally different problem for the compiler. And the difficulty here we have is that the syntax of the language tells us very little things and so does not provide significant help to the compiler. Let me keep digging a little bit more deeply into the specificities of JavaScript. If we have this piece of code, it's pretty obvious how GCC will compile that. Basically it will generate that accessing a value from one pointer is just one assembly instruction. Pretty obvious, pretty straightforward. You don't even need to have a smart compiler for generating that. What about JavaScript? Well, same syntax as Java, but the story is now radically different. And here you see excerpt of the specification of the language. So if you want to execute O. P, you have to follow the semantic steps. And the first thing you notice is that in this semantic function you have several steps to execute, several tests, several conditions to verify, several bands that you have to take depending on the value you get. The thing that is even more complex is that you see at the very first line here, the first thing that the semantics tells us is that if you want to get the property where you have to invoke another semantic function even before doing this dispatch with cheese up there, what is this other semantic function that you have to execute? Well, this is the one that will tell you how you have to follow the prototype chain. So this is actually implementing a loop. You check do I have my property in that object, yes or no? If it is there, I will return it to the other function that will make the testing. Otherwise I will follow the prototype chain. But you see here, this function, it is using another function here. So what is this other function? Well, this is the one. So this one tells you that yes, make the lookup in the hash table and if you find the value you see there is this weird syntax somewhere around here which is allocate allocate an object and return that object. That is when you are accessing a property from an object. The semantics tells you that first you have to make some hip allocation and of course the compiler will be able to somehow not execute all these steps. But once again, remember in C accessing a property was just one assembly instruction. Here you imagine all the paths that we have to follow in order to make the efficient property access. So mitigation for that problem has been discovered while in the past and they are used in all JavaScript implementation. They get rid of most of these steps. But still we are pretty far away from the performance of C when accessing an object. Maybe the last example I would like to show you today function call. Here we have a C code where we have an external function F and we invoke it with two argument one and two we get the result and we somehow with three. OK in C. Pretty simple. You just pack your argument in the stack. If you are using an X 86 platform here you just push the argument on the stack, you make the call, you get back the result, you add three, you return, you're done. Pretty simple, pretty straightforward. What happened with JavaScript? Well, JavaScript is dynamically typed so the first thing that you have to verify is is F really a function? Am I allowed to call F? If I'm not, I have to throw an exception. So exceptions are involved here. Let's assume that F is a function. Next thing you have to verify is F a function accepting two arguments because it might be that F just accept one or zero argument and in some case you will have to drop the extra arguments that you are passing to the call. On the contrary, it might be that F is expecting more than two arguments and then you will have to complement the missing arguments. When I say you will have two, it means that the runtime system will have two. So it means that you will have to do something at run time. If F happens to be using this special value arguments with an F, it means that you will have to hip allocate value representing all the arguments that you are passing to this function and you will have to deliver that value which has a dynamic extent to the function which will be able to use it as it wish. Then you will have to assign the ZS value and then eventually you will make the call. So you see, once again we have the simplicity of C which is just pushing something into the stack or in register depending on your platform, while with JavaScript totally different story. And once again the problem is that the structure of the program does not really help the compiler, the syntax fairness does not provide sufficient information so that we are able to implement that efficiently. Okay, so read of time compilation of this crazy language because I'm not assuming that all of you are exactly and precisely familiar with all the technique we are using when we are implementing a compiler. A very brief introduction of how we are usually implementing programming languages. On the very last part you have interpreter and typically this is what you find for Python, the C Python implementation is just an interpreter. Interpreter are used for fast development cycle because you execute as you go with your program, you read a program, you execute. So it's very convenient, convenient for making some experiments. First development cycle, as I said. But the downside is that you have very slow executions at the very right part of my drawing, you have heard of time compiler, static compiler, which a couple of years ago were just called compilers. Today we have to say AOT compiler. Typically this is what you get for languages such as Ocmel, GCC, Rest, all these kind of languages, slow development cycle because usually compiling statically one of this program takes a while. If you have a large C code and you try GCC three, you will notice that it takes a while. Complex implementation because developing such compiler is a complex task and but the good side is that it delivers good performances for static languages. In the middle you have things that are just a little bit of interpreter, a little bit of static compiler. You have typically bytecode compiler such as the JavaScript compiler here. The purpose of the bytecode compiler is to simplify a little bit of execution by implementing some operations such as for instance, resolving variable names may be preparing the construction of the execution stack, but nothing too fancy and nothing too complicated. And then you have Jet compilers and jeep compilers are typically pythat you get in all the commercial JavaScript implementation, sorry, jeep compiler allows fast development cycle because as a user you do not even notice that the compiler is existing. You just run and you just start your program and it run and it handles all of everything for you. It's super complex implementation because you have to implement an interpreter, a compiler and something that is able to replay the code that you are interpreting with the code that you will get generated by the compiler. So it's really extra complexity and it's quite convenient for dynamic languages in particular because the jeep compiler is able to do some profiling. So discovering during the execution what is going on and what is the best code it can generate. What we are trying is we are trying to build this hub compiler, which is a static compiler for JavaScript that is not taking the same approach as jeep compiler. So this compiler exists, it's called hub GS, it's an opportunistic compiler and you will see what I mean by that later on. It supports fullfledged JavaScript, it is almost compliant with the latest version of JavaScript. It is also almost compliant with NodeJS, which means that you could potentially use this compiler as a replacement for NodeJS, but it optimizes the core language. But there are some parts of the language that it does not optimize yet. So it means that for this part you will not get good performances and typically some library functions are not well optimized or some data structure are not well optimized, neither such as floating point numbers that the compiler does not optimize so far. So the first question is does it really make sense to try to compile statically a language such as JavaScript? Well, the motivation for this work are numerous. The first one is that on some platforms you are not allowed to write the executable memory. So for this platform you cannot rely on a Jet compiler. Typically, if you want to build your own JavaScript implementation for iOS for instance, you are not allowed to change the executable memory. So you cannot do it, forget about it. So you have either to use an interpreter or a static compiler. Then you have platforms which have very limited memory, platforms such as microcontroller, for which you cannot consider embedding a Jet compiler in the platform because the memory is not enough. Then you have execution that are very short, typically cloud services, or even imagine for instance, a shell command. Imagine that you want to implement LS in JavaScript. Maybe just running LS is too fast, too quick, so the Jet compiler will not even have the time to optimize LS before the execution is completed. So for this kind of execution, a static compiler might help. And also as a researcher, this is an intriguing question because all the current implementation, all the currently fast implementation are using Jet compiler. But does it mean that it is not possible to build a static compiler for such a language? Or how far can we go in this attempt of building and optimizing elements and compiler for JavaScript? So all of these are a little bit of my motivation. So the question then the second question is how to attack the problem which seems difficult. Well, first of all, we will accommodate the technique of Jet compiler to static compilation and I will show you more precisely what it means. Then we will reuse all the things we know about compilers and all the things we know about optimization. That is, we will do a little bit of typing, we will do a little bit of data flow analysis, a little bit of control flow analysis. And what we will do is a lot of optimistic heuristic. And this is the point that I will develop in the rest of this presentation. So optimistic compilation. The idea is pretty simple, pretty straightforward. Consider a function such as this one the fibonacci function. Well, the idea is just we will generate several versions of such a function, one version specialized and optimized for a dedicated type and a generic version, and we will just make a dispatch, selecting according to the type which version to select. And if we are happy enough that the arguments are of optimized time, we branch to the efficient version and we are done. So why is that significantly better? Because in the optimized version we will be able to use dedicated operators that are not strictly following the JavaScript semantics, but that are much more efficient. And also because using the type inference we will be able, for instance here and there on the recursive call, not to go through the deep spatch but directly branching to the optimized version. So the idea is pretty simple and more or less it's doing statically what a deep compiler will do, selecting the prototype and generating several versions. So the question is how do we select the type for which we want to generate this version? And here the idea is absolutely straightforward. In the past when we were doing type inference, we were always trying to find the most generic type representing all possible execution. In this context we are not interested at all by these types. The only types we are interested in are the types for which a compiler is able to generate a good code. So instead of trying to infer the most general type, we'll just try to find out which is the type for which a compiler is able to generate the best code. And this is the type we will select, this is the type we will pick and for the rest we don't care. So how does it work? It's extremely straightforward. We walk through the program and for each expression we apply a rule. And the rule tells us for this variable used in this expression, if it was of that type, I will be able to generate a good code or a bad code. And we just for each expression we apply all possible rules that we have. And at the end of the day we just check what has been produced by the rule and we select the most efficient type. So what are these rules looking like of this kind? When you have a property access from an object x. A property e. If x was an object. Then we will be able to generate a better code because we would be able to avoid a type check if in addition. We knew that e was an integer. If x was an array. We would even be able to generate a better code because we have dedicated operation for accessing arrays with integers. If the property happens to be length, then if the value was an array or a string, we would be able to generate an even better code because we have dedicated cases for these two special cases, if we see an assignment. As strings are read only we know that we can forget about strings. So this is why we use this minus infinity type here, wait here saying that we are not interested in having x being an array string, sorry, but we are interested in having x being an array. And we keep going with all these rules. We have plenty of them, I just presented a bunch of them, but we have much more, we see. We have rules of this kind where we say if this is a bitwise operation, if the object happens to be an integer, then we will be able to generate good code. Otherwise we'll have to go through a dispatch. So we are not interested in other cases. So let me show you an example how this works. Let's consider this function that is just reversing an array of values and we just applied the rule blindly, one after the other. We have an expression of this kind in the program, A of length. So by applying rule one we say, okay, a place the compiler make it an object. But by applying rule two we also have dedicated rules for array and for strings we have an access of this kind. So by rule one we increment the object type for this variable A. The type inference tells us that I is an integer. So we increase the weight of array and string for the possible type for A and we keep going like that and at the end we observe that okay, the best possible type is A being an array. And you see, that absolutely simple because it's just one pass, you just accumulate the wave and at the end you just pick the most probable or the most optimizable one. And then when you are done with this type, you just generate the code as I presented before. You have your dispatch, your dedicated function, the generic one, and on the two possible calls, either you are able to branch directly to the optimized version or you go through the generic version and that's it. And this is how we implement this sort of opportunistic completion. We have just generated the best possible code and we have made a bet in some sense we have guessed that this was a code that is likely to be used. This is one of the things we have in the compiler. But this is not it. JavaScript is a very big language, core language is complex because you have all these prototype things, you have inheritance which is complicated, the language keep growing, you have extended set of libraries. So for a very small academic team, it's not really possible to implement efficiency, all this aspect of JavaScript. So what we did, instead of considering JavaScript and generating native code, we consider JavaScript and we generate Scheme code. So this may seem a little bit weird, but actually scheme is the original motivation for JavaScript. So scheme is actually very close to JavaScript, except that Scheme is much more simpler and it has a cleaner semantics. Many of the dynamic aspects which are very complicated to deal with JavaScript are absent from Scheme. So Scheme can be considered as an interesting intermediate language for JavaScript. So we have been doing that and we reuse a Scheme compiler that generates C code. So actually the full compilation chain is we go from Scheme from JavaScript to Scheme, scheme to C and C to assembly code or object code. Okay, now let me show you just another example of this idea of opportunistic compilation when Scheme is involved. So if you are not familiar with scheme, don't be too afraid with this piece of code I will show you the very same one in JavaScript in just the next slide. But still here we are defining a function called Getseater which will return a function Ctor which is defined here. This create an object. And the only interesting part here is that function uses two three variables CNT and Bayes. And this one is a floating point. This one is an integer. This is the only part which is interesting. And here you see we are assigning a new value to CNT which means that we are mutating this free value. Okay, sorry, my arrow is a little bit weird. How is that implemented or what is happening at run time when we are executed this piece of code? Well, here you see, this is the assembly code for this function over there. This is the value of the floating point. This is the value for the CNT variable and this is the object representing this function. This seat of function in memory. This is very classical. This is a standard way of compiling Scheme. A function which is a data structure contains its free variable. So this is a Hip allocated structure. Okay, another four scheme. So now the very same thing in JavaScript. Exactly the same program, exactly the same. Meaning I have my CTRL function. I return the CTRL function. I have two variables free in this function and I assign CNT. So exactly the same. Exactly the same program. What is the difference with JavaScript? Well, in JavaScript you need this extra memory and this extra object. Because in JavaScript a function is a regular object. So you can assign properties or you can store values arbitrarily as you want in your function. So you see in Scheme we had just this object and in JavaScript we have the overhead which is all this new object that has appeared. So clearly you can see that JavaScript will not be as efficient as Scheme by any, by no means. But what do we do with that? Well, the idea is that, okay, in the general case a JavaScript function or JavaScript closure will be an easy weight object. But in many cases we know what we will be doing with the function. And in particular for these kind of cases which are extremely frequent. You are just invoking the forage function and an array and you are passing here an arrow function. So the idea that it would be a shame to allocate a fullfledged JavaScript function for this arrow function because we are just invoking the JavaScript library function for each which is not doing anything special with that function except calling it. So the way we compile that is we generate this scheme code, that is we are assuming that forage is actually the forage at the ARL library of JavaScript. And then instead of building a true JavaScript function, we just allocate on the stack, a skin function, a lightweight skin function and this function here and we call for this function JS array maybe for each. And how is this function implemented? It just check is the receiver a regular array that is not overriding for it? And if it is, which is always it just directly invokes that function without allocating anything. And if it is not then it goes through the slow path. So we are somehow making a bet here. We are guessing that yes, the user is not overriding forage of arrays and yes, he is using forage with arrays and if he is then we have a very efficient code, as efficient as the scheme code and if it is not then yes, we will follow the slope path. So this is another example of how we are using this idea of opportunistic or optimistic compilation in a static context. Good. So what about performances? When you are dealing with JavaScript and JavaScript efficiency it is impossible to get any paper accepted if you are not presenting official JavaScript benchmarks. So you have to go through this jetstream, octane, you name it. Although these benchmarks are known to be flowed to have biases that you don't want to deal with, but still you have to present these values. So here we are for Hop compared to V eight and Spider monkey and the other player. So Hop is in yellow and you see that in this test trait it seems to perform not so well, but actually there are reasons for that. First, as I mentioned before, Hub is not optimizing floatingpoint numbers. So all the benchmarks that are using floatingpoint numbers have poor performances with Hub and then you have other benchmarks for which there are special patterns and if you miss these patterns then your performance are on full and Hop is not able to optimize all of them. Some because we don't know how to optimize them and some because we simply don't care. And I will show you two examples of these patterns. Let's consider delta blue. Here in the code you see this pattern, this direction is just after this implementation as an enun. But it is used as an object everywhere. So if the compiler is not able to detect that this is a constant, this is an allocated object that is never used as an object only assigned here and make a replacement that is a constant propagation. Here the type inference is not able to do a good job and your performance will be doomed. Hopefully for this very case, App is able to optimize that. But there are other cases slightly similar to this where App is not able to do a good job. Let me show you another example, my favorite one. It's an example so cryptoshare one, as you know, as you can imagine, it's a function implementing a crypto function used in a crypto library. And in this code you see this function and you see it's an interesting one because what it does is it creates an array and here in a loop, it executes this bitor operation. But you can notice that array bin has not been initialized before. So actually this code is totally broken because if someone assigns something in the RA prototype object, your crypto library will be totally corrupt. So this code makes no sense at all. So yes, Op is not able to optimize that and it has poor performance on that benchmark. But really, is that a problem? We are not optimizing this phone code and you have this kind of problem in many of these octane jetstream and so on. So what we did is we designed another test suite for which we got our programmer from different sources. We took program coming directly from NPM, NPM packages that have no dependencies. We use this test, this program, in our tests we imported some program from other languages and typically from Python and from Scheme. And so we designed a large set of new benchmarks. I presented just some of them here and you see, because we're interested in comparing Jeep performance and read of Time compiler performance, it's not a peak performance that we are interested in, but we are also interested in understanding how the performance evolves if the complexity of the problem evolves. So here in the horizontal axis, you see the program is getting more the data set, the input of the program is becoming more complex and so the execution lasts longer and you see how the comparison in between the different implementation evolves over the time, depending on the complexity of the problem. So you see, for this program, Hop performs similarly to other Jet compiler. So here basic, it's a basic interpreter, it's a mountain compiler, sorry, I have to make glasses liver, it's a Scheme interpreter. And Bag is just as implementation of a small game using recursive function and integer manipulation, just to keep going with some other example moment. It is the most popular package on NPM and it's a package dealing with dates. And the runtime implementation of dates in her is not so great. We are not very good at implementing date function, so it is not performing so well, but it's just twice lower than Spider monkey, for instance. So it's not that good, but it's not. That powerful either here it's a program parsing the commandline and this one checking computing unique identifiers this program is interesting it's a version of the recharge program from Octane but where we have introduced proxy object each time we create a value and you see that outperform all the other implementation and we know why because we have a special optimization for proxy but the interesting point here is that even for a very dynamic feature in a static context we are able to deliver good performance so that's it for execution the promise of having more efficient memory consumption here you can observe it that yes indeed hop is really provide better performance when we are talking about memory because you don't have to embed the Jet compiler in your heap when you are running your program and we observe that for the virtual size and the resident size and I think I'm done so just in conclusion we have built this static compiler it fully supports JavaScript it uses a lot of optimistic and opportunistic techniques and these are described in the paper we are publishing it is not as good as the most efficient Jet compiler yet but it deliver competitive performance that is most of the time we are in the range where we are as fast or two or two five times slower than Jet compiler but nothing worse but in terms of memory from prison we are much more efficient because we are not embedding the compiler and I thank you for your attention and I don't know if we have time for question no. We don't know I think what you are referring to is profile guided optimization Jet compiler I don't know because it requires an infrastructure and the Hopkinse system is not ready for that profileguided optimization yes. Definitely yes and hopes is doing a little bit of that but I did not mention that in this presentation because so far the results I have obtained are slightly disappointing because with profile guided optimization more or less we are able to reduce generated code size but we are not accelerating executions and my guess for that is because with profile guided optimization maybe you can remove some of the guards that you have or your dispatch but these generally are optimized by the processor and by the branch predictor of the processor so there is something that I don't really understand here but yes. This should be interesting but not so far so no we don't have the old program because it compiles modules so inside the modules yes we are able to do some global analysis and yes. I think we are able to do a better type analysis in particular because we are using abstract interpretation for executing everything but in the end a Jet that is able to do some precise profiling will also probably be able to discover the types that we are discovering so it's not so clear that it is a huge advantage maybe but maybe not. The next step I think for me is whether to try to reuse the types that are generated by the TypeScript compiler because this is a different setting and probably more friendly to static computation. So this is an exit thing I will experiment with. I don't really know, you know, with these languages JavaScript or maybe Python. There is this performance cliff phenomenon. I don't know if you are familiar with that the idea that as long as you are in a subset where the compiler is able to understand what you are doing. You have good performance. But if you change it slightly bit somewhere. All of a sudden maybe your performance will degrade with 1000 x ratio for a reason that you as a problem cannot understand. For hop I'm experimenting something slightly different in JavaScript you have these different modes you have used strict and in the past Google has tried what was the name usedtrong? Or something like that, which was even stricter than you, strict. And for hop I'm using something similar. I have a dedicated mode where some of the JavaScript features are removed, are not allowed. And then if you use this mode syntactically JavaScript, the data structures are compatible with JavaScript, but there are some operations that you are not allowed to execute and then you are guaranteed to have better performance, but you are sacrificing a little bit of the dynamic city of the language, but this is compatible with the rest. So some of your module might be in that mode and there are modules in regular JavaScript mode and you combine that and that's okay. So this is the approach I'm following. Yeah, I don't know, in theory I could say yes, I will spend time and I will succeed in optimizing that. I don't know, maybe there are difficulties that I'm not even suspecting. Honestly, I cannot answer that question. I'm not even sure that this is doable, to be totally honest. I think so, but I cannot guarantee that. Okay, thank you very much. Thank you for your attention.