Dissecting American Fuzzy Lop A FuzzBench Evaluation 要点

paper: https://www.s3.eurecom.fr/docs/fuzzing22_fioraldi_report.pdf

两个实验的结论 (主要基于 FuzzBench)

  • Our conclusion after this experiment is that AFL, and follow-ups fuzzers like AFL++, should provide an optionto disable hitcounts. AFL++ provides many different op-tions, and the users are suggested to run an instance of each variant when doing parallel fuzzing, a common use-case in real-world setups. The fact that in our experiments,hitcounts have shown a highly variadic behavior suggests that users should include a variant without hitcounts when doing parallel or ensemble fuzzing like OSS-Fuzz.
  • The conclusion we can draw from this experiment is that it would be a mistake to underestimate the impact of the novelty search. In particular, researchers proposing new approaches that also modify this aspect should care-fully evaluate – in isolation – the benefit of a different mechanism to decide if an input is interesting, as AFL’s novelty search provides a strong baseline.

论文中计划要评估的 afl-fuzz 的一些技术

  • Hitcounts: Hitcounts are adopted by other fuzzers to-day, but AFL was the first to introduce this concept.Despite its wide adoption, the impact of this optimization (overplain edge coverage) has never been measured in isolation on a large set of targets.
  • Novelty search vs. maximization of a fitness: While AFL considers every new discovered hitcount as interesting, both other early fuzzing solutions and more recent tools instead only consider testcases that maximize a given metricas interesting. For instance, VUZZER uses the sum of all the weights of the executed basic blocks.
    • In this experiment, we benchmark the AFL approach versusa fitness maximization and the combination of the two ap-proaches, as proposed by VUZZER
    • We expect the novelty search to outperforms both of the competing algorithms,
  • Corpus culling: The prioritization of the small and fast testcases in the AFL corpus selection algorithm trades speed with the fuzzing of more complex testcases that often corresponds to complex program states
    • In this experiment, we want to assess the difference in using the AFL corpus culling mechanism versus using the entire corpus.
    • We expect faster growth in coverage over time and,potentially, more bugs triggered in the same time window
  • Score calculation: The performance score used to cal-culate how many times to mutate and execute the input in the havoc, and splice stages are derived from many variables,mainly testcase size and execution time
    • In this experiment, we want to measure the delta between the AFL solution and the baseline, represented by a constantand a random score
  • Corpus scheduling: The FIFO policy used by AFL is only one of the possible policies that a fuzzer can adopt to select the next testcase
    • Thus, we evaluate AFL versus a modified version that implements the baseline, random selection, and the opposite approach, a LIFO scheduler.
  • Splicing as stage vs. splicing as mutation: Splicing refers to the operation that merges two different testcases into a new one
    • We modified the AFL codebase to implement splicing as a mutation operator to compare the two.
  • Trimming: Trimming the testcases allows the fuzzer to reduce the size of the input files and consequently give priority to small inputs, under the assumptions that large inputs introduce a slowdown in the execution and the mutations would be less likely to modify an important portion of the binary structure
    • Despite the fact that this algorithm can bring the two important benefits described above, we argue that reducing the size of the testcases could lead to lose state coverage.Additionally, the trimming phase could become a bottleneck for slow targets
    • Therefore, in our evaluation we plan tocompare the default version of AFL against a modified one,where we disabled trimming
  • Collisions: As explained in section III-F the AFL ap-proach to instrument the source code of the target programs consists of assigning an identifier for each basic block at compile-time.
    • We want to benchmark this feature as the collision-free variant is simpler than the original implementation with pc-guard, raising the question why random identifiers are used in AFL