Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer
Suyoung Lee, HyungSeok Han, Sang Kil Cha, and Sooel Son, KAIST
JavaScript (JS) engine vulnerabilities pose significant security threats affecting billions of web browsers. While fuzzing is a prevalent technique for finding such vulnerabilities, there have been few studies that leverage the recent advances in neural network language models (NNLMs). In this paper, we present Montage, the first NNLM-guided fuzzer for finding JS engine vulnerabilities. The key aspect of our technique is to transform a JS abstract syntax tree (AST) into a sequence of AST subtrees that can directly train prevailing NNLMs. We demonstrate that Montage is capable of generating valid JS tests, and show that it outperforms previous studies in terms of finding vulnerabilities. Montage found 37 real-world bugs, including three CVEs, in the latest JS engines, demonstrating its efficacy in finding JS engine bugs.
View the full USENIX Security '20 program at https://www.usenix.org/conference/usenixsecurity20/technical-sessions
Hello, I'm sorry, only from KAIST, I'm going to present my paper montage on your network tomorrow, kited JavaScript, Tanjim Folger, this is a joint work with the council can go to my adviser, Shiozaki. Web browsers are becoming more and more popular in 2018. Almost four billion people in the world used Web browsers every day. In a Web browser such as Chrome, there is a component called JS Engine, which executes JavaScript code of a Web page. Let's assume there is a vulnerability in this engine. Once a benign user accesses malicious Web page, the adversary will be able to execute arbitrary code on behalf of the user by exploiting the vulnerability. Therefore, gas engine for inequality's was a critical security threat. Moreover, according to 02, when a genius engine vulnerability is change with the local privilege, escalation driven costs up to five hundred thousand U.S. dollars. To find such a sense invokes, previous researchers have employed an approach called Fudgie Jason Jim Fuzziest repeatedly generate arbitrages code and test whether the generated code triggers anybody. So how can these buzzer's generate just code? There are two different types of a flutter in terms of generating chase code, rotation based and generation based buzzer's. Mutation based buzzer's such as Lenfest randomly combine a few substories extracted from strategies. On the other hand, generation based fathers such as James Funfest simply apply JS gramophones from scratch. However, both of them are basically not that different. They randomly combine a of trees or randomly apply. She has crumbles. For example, let's assume this is the current U.S. officials are now wants to appoint a new building block to this node and it has three building blocks which are dependable to this node. Then just like Lengthiest or James Funfest, simply select one from these candidates and appended to the node. In other words, they randomly select their building blocks as long as Jesus Grandma alos so we thought we can do much better than this. Our intuition is that if we have a set of triggering JS code, we can extract similar patterns from the set, then we can leverage these patterns to create another bug triggering JS code. For instance, from the previous scenario, we can now refer to the extracted patterns to know which building block is more likely to trigger adjacent inputs. Therefore, we can select the next building block based on these patterns. To investigate whether such patterns exist in bulk, triggering dress code refers to analyze functional and syntactical commonalities in gas engine vulnerabilities. To find a functional commonalities, we collected and analyzed patches of a 50 track record series on 150 patches, 18 percent revised global skippy and 14 percent revised JavaScript or radarscope file. In other words, 18 percent of CVS's are related to global optimization and 14 percent are related to JavaScript rate. For syntactical commonalities, we compared 80 substories extracted from two sets we collected of two thousand thirty eight Jesus tests from Taqwacore Repository as of August 2016 and extracted substories from the set. We also collected 67 pieces, triggering Chuco cursives reported after August four thousand. Then we checked whether substories from the second set exist in the first set. Surprisingly, ninety five percent observatories extracted from the second set already existed in the first set. For instance, this is a snippet of just code in the first set and this is a Pewsey triggering a Sivi reported after that. As you can see, these two our syntactically very similar based on the two observations from the preliminary studies we designed on New Foser OneTouch to address functional commonalities. Montages leverages the functionality of existing Geass tests that previously triggered serious engine bugs by mutating them and to address syntactical commonalities, quantize models, the relationships between a C substories. Montage works as follows, given a set of JS code, IFPRI processes Jasco into a sequence of fragments, then it transcends the network language model with the preprocessed sequences. Finally, under the guidance of the trained model, it mutates to just test to generate a triggering genes could. So let's first take a look at the pre processing step we have just called here and the corresponding NIST. From this, AC 130 extracts a fragment, which is a subtree of depth, one particularly, it extracts fragments in the preorder manner. So the next fragment is this one. Then just one. This one. As a result, the AC on the left side is representative with a sequence of fragments on the right side. So why this morning is novel? Let's assume that we want to predict the next fragment, then starting from the rich note, this representation enables modeling people over composition or relationship between fragments. In other words, Benthos can predict the next fragment based on all the preceding fragments. And like just representation, several previous studies moral the code as a sequence of tokens as shown on the right side. However, according to our evaluation, most of the trace code created by a top level model or invalid among the hundred thousand is code only point five eight percent of code created from the top level model where valid. This difference stems from the capability of selecting valid next fragments. Let's assume that given this fragment when Toge wants to predict the next one. Since we represent Jesus covered with fragments, we know that the next fragment should start with the burial declaration like this one. Therefore, with this representation, punters can always append family fragments. At the same time, since this is a sequential form of representation, it can be directly applied to train any prevalent language models, including on enlisted model and a Markov model. Among many language models, when Taj employees and analysts model, which captures long term dependencies between fragments, basically given a second of proceeding fragments, the model is trained to predict the probability distribution of a next fragment. After training montage mutates a ACTC by leveraging to train LSU model, you can see a Sergius test on the right side and the corresponding Høst on the left side for simplicity. We only show a snippet which corresponds to the shaded area statement. From the city, he to so randomly selected subtree and prepares a sequence of fragments that represents the current høst, then it quarry's the trained model with the six. The model will output the probability distribution of the next fragment. Then from the top fragments, when times are randomly selects one fragment and the pancit 380. As the preprocessing step, Muntasser repeats this process in the preorder menor. So it appears the next frontier and the next one here, here and here, that's shown India sixty one thousand bits of pending fragments until there are no more non-criminal symbols and elephants. After mutation integer outputs such as code as shown on the right side to evaluate our father montage, we collected thirty three point five unique case files from the repositories of four major engines as of January twenty seventeen, then the grandfather. So gains to Taqwacore one point four point one, which is released in February. Twenty seventeen that is traced could trigger unpatched box in this version of Taqwacore does not exist in our training set. We compared a montage with three different situations in theaters, quote, Alchemist James Fantas and I Foser, we ran five trials of our and campaign, each of which lasts for seventy two hours to table reports. The median number of unique crashes and non is found by each foser. As you can see, montage outperformed performed the other state of the art forgers and their differences were statistically significant. We compared to different representations of Jews, hoteling Jews code as a sequence of fragments and a sequence of tokens, token orendain means mutating Jews code by leveraging on Elysium model trained on Jesus code talkers. As you can see, Montage found much more Vogues compared to Topknot actually only point five eight percent of the Jesus test generated by Tolkan RNA were executed without errors to evaluate the effectiveness of the elastin model. We compare it to montage with random fragments, assembly without guidance of any model. The results shows that the Foser finds more books when it is guided by the same model. We further analyze JS code created by OneTouch. This graph shows how many fragments are appended to compose a new subtree, as the red line shows to trigger 90 percent of security folks, followed by Montage. It opens up to 52 fragments. In other words, given a sequence, the model captures long term dependencies between fragments and effectively predicts the next fragment. Finally, we have a montage against the latest JS engines and found of thirty seven previously unknown bugs along the Funderburk two and one security Boksburg respectively, found from Taqwacore and JavaScript core. Twenty six of them were purchased at the time of writing, and especially Microsoft, rewarded with five thousand dollars for the book found in Taqwacore. In conclusion, we proposed that first during that Turkmenbashi moral clarity change in future and demonstrated its efficacy. We proposed a novel approach of voluntary code as a sequence of fragments on which any prevalent language models can be trained without modification. Montage outperformed state of the art forgers in the old version of Taqwacore Montage found thirty seven previously unreported folks from the latest engines to support future research. We make our code public at this address. For more details of montage, please refer to our paper. Thank you for your attention.