In this blog post we present PASTIS, a Python framework for ensemble fuzzing, developed at Quarkslab.
Introduction
PASTIS is an open-source fuzzing framework that aims at combining various software testing techniques within the same workflow to perform collaborative fuzzing, also known as ensemble fuzzing. At the moment it supports Honggfuzz and AFL++ for grey-box fuzzers and TritonDSE for white-box fuzzers. The following video (in french with english subtitles) gives an insight into the principles of PASTIS:
In May 2023 PASTIS participated in a fuzzer competition sponsored by Google in the context of the 16th International Workshop on Search-Based and Fuzz Testing (SBFT) co-located with ICSE 2023, the 45th International Conference on Software Engineering, one of the longest running and most prestigious software engineering venues.
Our collaborative fuzzing approach won first place, tied with aflrustrust
, in the bug discovery category which ranks the fuzzers that find the highest number of unique bugs. The paper, published in the research track of the
workshop, presents the contributions of this work:
PASTIS is now open-sourced under Apache License 2.0. You can find it on the Github repository.
In this blog post we present an overview of the framework and a simple guide to start using it in your projects.
Overview
Software testing is crucial to uncover bugs and vulnerabilities. To that end, multiple automated testing techniques like fuzzing are used. This approach has been extensively studied in the literature and improved over the last few years. Fuzzing relies on executing as many iterations as possible of a target program over different inputs generated with pseudo-random mutations and possibly with the help of a structure model or grammar. Both execution and input generation algorithms have been improved over time to explore deeper program states.
Dynamic Symbolic Execution (DSE) is another approach to software testing. It is a formal technique also used for program exploration and testing. Advances performed in this research area made it a functional approach used in state-of-the-art software testing tools. The DSE principle is to precisely model each instruction's side-effects to track input propagation in the program and express branching conditions as first-order logic formulas.
While fuzzing is empirically effective, it tends to cover shallower states. In comparison, DSE is slower but is theoretically able to cover deeper states by solving complex branch conditions or complex code constructs.
The goal is to combine grey-box fuzzing and DSE to leverage their respective strengths and reach better coverage than either of these approaches on its own, or at least, obtain the same coverage faster. Challenges are threefold. First, one needs to deal with the implementation discrepancies of various engines, such as input formats and execution speed. Second, input generation throughput is a challenge as input flooding might alter the normal behavior of engines. The last challenge is to combine them asynchronously so that no one is blocking or slowing down the others.
We propose a combination of fuzzing and DSE into an ensemble fuzzing framework called PASTIS that helps in circumventing engines inner-working discrepancies.
Our approach combines heterogeneous test engines by solely sharing test cases (inputs). Each engine then decides whether to drop it or not. If the input triggers a new program behavior regarding a given engine's coverage metric the input is kept, otherwise it is discarded. Being significantly slower than fuzzing, DSE should replay each input it receives at a satisfying speed to update its coverage and decide whether to keep the input. We designed an ensemble fuzzer combining grey-box fuzzing and white-box fuzzing (DSE) built around a broker that performs seed sharing and aggregates the resulting corpus and data.
PASTIS benefits from Honggfuzz and AFL++ two widely-used and effective grey-box fuzzers. PASTIS also takes advantage of TritonDSE, our Python framework for dynamic symbolic execution released recently.
Architecture
PASTIS is composed of two main components: a broker and a set of engines or fuzzing agents.
The broker, called pastis-broker
, is the main interface with the user. It is implemented in Python and ensures all communications between the available engines. It is built using a library called libpastis
which handles all the communications.
The communication protocol is based on the message-queuing framework ZMQ, which is interoperable with almost all existing programming languages. However, the most interesting feature it provides is over-the-network communication. This allows PASTIS to be run over multiple machines.
An engine in PASTIS is any fuzzer or DSE tool wrapped in a thin Python module, called Driver
(also built using libpastis
). This module implements a series of callbacks that allow communication with the broker. The broker sends the engines the target, settings, and seeds. The engines, on the other hand, send the generated inputs and telemetry. Each engine handles coverage using its metric, adding or discarding an incoming seed according to its own rules. This approach allows sharing of seeds easily. The broker is in charge of aggregating the inputs produced by the engines and sharing them.
Engines
The three fuzzing engines supported right now are Honggfuzz, AFL++, and TritonDSE (pastis-honggfuzz
, pastis-aflpp
, and pastis-tritondse
, respectively). PASTIS implements a driver for each fuzzer.
The figure below summarizes the architecture of PASTIS. It shows the main interactions between the fuzzers and their respective wrappers. All inter-communications are performed through filesystem monitoring (inotify
on Linux).
Quick example
The FSM demo is a tiny software implementing a state machine that contains a bug. It shows how to combine the various approaches into a collaborative fuzzing campaign within the PASTIS framework.
The code fsm.c
read "packets" from stdin
. Each packet is a struct composed of an ID (16 bits) and a data integer (32 bits). Depending on the ID and the data the FSM switches states. You can download it from here
After installing PASTIS, we need to build our target. For this example, we only have to run make
. Keep in mind that the target is compiled using the compilers provided by Honggfuzz and AFL++, hfuzz-clang
and afl-clang
, respectively. This will instrument the target for both fuzzers. This is not necessary for TritonDSE as it processes the target binary without any instrumentation. Below we show the commands to do this:
$ tar xvf fsm-demo.tar.gz
$ cd fsm-demo
$ make
$ ls bin
fsm.afl fsm.hf fsm.tt
After compilation, it is just a matter of launching the broker and each engine. Note that the broker receives three parameters. The first one points to the folder with the three versions of the target binary. The second one points to the folder with the initial corpus. The last one points to the workspace used by PASTIS, where it will save new inputs, crashes, hangs, logs, and stats.
pastis-broker --bins bin --seed initial --workspace output
By default, PASTIS shares the generated inputs with all the running engines. That is, the input generated by one engine is added to the corpus of the other engines. Depending on the target this can be beneficial or not. This can be changed using the --mode
option.
Once the broker starts running, you'll see the below output on your screen, which indicates that it detected all three binaries.
2023-05-15 19:28:04 [ BROKER ] [INFO] new binary detected [LINUX, X86_64]: bin/fsm.afl
2023-05-15 19:28:04 [ BROKER ] [INFO] new binary detected [LINUX, X86_64]: bin/fsm.tt
2023-05-15 19:28:04 [ BROKER ] [INFO] new binary detected [LINUX, X86_64]: bin/fsm.hf
2023-05-15 19:28:04 [ BROKER ] [INFO] Add seed initial.seed in pool
2023-05-15 19:28:04 [ BROKER ] [INFO] start broking
The broker will wait until, at least, one engine connects. To launch the engines is just a matter of running three commands (in three different shell sessions):
# Shell #1
pastis-aflpp online
# Shell #2
pastis-honggfuzz online
# Shell #3
pastis-triton online
After a few seconds, all the engines are connected to the broker and working as shown in the screenshot below (the broker in the left):
It is worth noting that PASTIS can run on different machines. This means that the broker as well as each engine can run on a different machine. For those interested in trying, it's just a matter of adding the command-line option --host <IP-OF-THE-BROKER>
to each engine (it's possible to specify the port with --port <PORT>
, the default one is 5555
). For example, the AFL++ engine the commands would be: pastis-aflpp online --host <IP-OF-THE-BROKER>
.
We also provide a docker image, for those who want to try it without installing the dependencies. You can find it here
Documentation
PASTIS is documented, here you will find how to install it and run it, a demo and the Python API. The documentation also includes instructions on how to add a new fuzzer.
Conclusion
This blog post presented PASTIS v0.1.1, a Python framework for ensemble fuzzing. PASTIS is one of the many projects developed at Quarkslab as part of our efforts to improve and ease our daily tasks on binary analysis and vulnerability research. We are now glad to open-source it so others can benefit from it.
The framework is experimental, any valuable feedback or contributions are greatly appreciated!
We would like to thank DGA-MI that initially funded this work. We also want to warmly thank all past contributors of the project, Acid, djo and Richard.