USENIX Security '21 - UNIFUZZ: A Holistic and Pragmatic Metrics-Driven Platform for Evaluating Fuzzers
Yuwei Li, Zhejiang University; Shouling Ji, Zhejiang University/Zhejiang University NGICS Platform; Yuan Chen, Zhejiang University; Sizhuang Liang, Georgia Institute of Technology; Wei-Han Lee, IBM Research; Yueyao Chen and Chenyang Lyu, Zhejiang University; Chunming Wu, Zhejiang University/Zhejiang Lab, Hangzhou, China; Raheem Beyah, Georgia Institute of Technology; Peng Cheng, Zhejiang University NGICS Platform/Zhejiang University; Kangjie Lu, University of Minnesota; Ting Wang, Pennsylvania State University
A flurry of fuzzing tools (fuzzers) have been proposed in the literature, aiming at detecting software vulnerabilities effectively and efficiently. To date, it is however still challenging to compare fuzzers due to the inconsistency of the benchmarks, performance metrics, and/or environments for evaluation, which buries the useful insights and thus impedes the discovery of promising fuzzing primitives. In this paper, we design and develop UNIFUZZ, an open-source and metrics-driven platform for assessing fuzzers in a comprehensive and quantitative manner. Specifically, UNIFUZZ to date has incorporated 35 usable fuzzers, a benchmark of 20 real-world programs, and six categories of performance metrics. We first systematically study the usability of existing fuzzers, find and fix a number of flaws, and integrate them into UNIFUZZ. Based on the study, we propose a collection of pragmatic performance metrics to evaluate fuzzers from six complementary perspectives. Using UNIFUZZ, we conduct in-depth evaluations of several prominent fuzzers including AFL , AFLFast , Angora , Honggfuzz , MOPT , QSYM , T-Fuzz  and VUzzer64 . We find that none of them outperforms the others across all the target programs, and that using a single metric to assess the performance of a fuzzer may lead to unilateral conclusions, which demonstrates the significance of comprehensive metrics. Moreover, we identify and investigate previously overlooked factors that may significantly affect a fuzzer's performance, including instrumentation methods and crash analysis tools. Our empirical results show that they are critical to the evaluation of a fuzzer. We hope that our findings can shed light on reliable fuzzing evaluation, so that we can discover promising fuzzing primitives to effectively facilitate fuzzer designs in the future.
View the full USENIX Security '21 Program at https://www.usenix.org/conference/usenixsecurity21/technical-sessions