Introducing TritonDSE: A framework for dynamic symbolic execution in Python

Posted Tue 02 May 2023
Authors Robin David, Christian Heitman
Category Program Analysis
Tags binary analysis, reverse-engineering, symbolic execution, white-box fuzzing, Triton, open-source, release, tool, 2023

We present TritonDSE, a new tool by Quarkslab. TritonDSE is a Python library, built on top of Triton, that provides easy and customizable Dynamic Symbolic Execution capabilities for binary programs.

Introduction

TritonDSE is a Python library built atop the existing Dynamic Symbolic Execution(DSE) framework Triton to provide more high-level program exploration and analysis primitives. The whole exploration can be instrumented using a hook mechanism that allows the user to run custom code on various events, like address, mnemonic, new input generated, each iteration, a branch to be solved, etc. It can be seen as a symbolic unicorn-like framework as it is not an off-the-shelf program, but a toolkit to build dedicated and specific analyses. Still, it is able to perform some exploration on its own and provides ways to customize it. It was partly designed to build a whitebox fuzzer now integrated into PASTIS. The framework is still experimental, thus any feedback or issue reports are appreciated.

Why not use Triton directly?

Triton is a DSE library providing all the necessary elements to analyze traces with concrete or symbolic information and also to generate and solve path constraints. It is written in C++ (Core and API) and it has bindings for Python. It works on all the major operating systems and supports the main architectures: x86, x86_64, ARM v7, and ARM v8. Yet, it is a low-level library. This means that it provides its users with all the required components to perform DSE tasks, however, it is the user who has to take care of the rest. That is, to load the binary in memory, load shared libraries, handle syscalls and more especially feed every instruction to execute symbolically to the engine. This can be a lot of work.

TritonDSE tries to address all these problems and adds extra functionality such as program exploration capabilities right out of the box. It works by performing an elementary loading of a given program and starting to explore it from its entry point. At the moment solely ELF and Linux are supported, but further development can lead to the support of more platforms.

TritonDSE provides the following features:

Loader mechanism (based on LIEF, cle, or custom ones)
Memory segmentation
Coverage strategies (block, edge, path)
Pointer coverage
Automatic input injection on stdin, argv
Input replay with QBDI
Input scheduling (customizable)
Sanitizer mechanism
Basic heap allocator
Some libc symbolic stubs

TritonDSE is now open-sourced under Apache License 2.0. You can find it on the Github repository.

Overview

TritonDSE allows users to load a full binary and start analyzing it right away. That means it is ready to be run (emulated through Triton) from its entry point, or any other address set by the user. It is possible to add hooks on many different events, such as when a given address is hit, on a given mnemonic, on memory accesses, and so on. This allows for a quick analysis of the program in just a few lines of Python.

It is possible to load a raw binary as well, i.e. a binary without a format, such as the case of firmware. In this case, users can manually describe the different sections a given firmware has, where they start and finish, and even set permissions for them.

TritonDSE comes with a memory segmentation feature that allows to set permissions, such as Read, Write and Execute, on memory regions. These are directly loaded from the binary, however, they can also be set manually.

TritonDSE also provides a probe mechanism that enables the attachment of modules during the exploration process. These modules can hook various events, allowing the user to implement, for instance, custom sanitizers.

The most interesting feature that TritonDSE provides is its program exploration capabilities. Under this use case, users load the target binary and provide a set of initial seeds. TritonDSE will use these seeds to run the program, collect path constraints during the execution, and generate new inputs. Each input corresponds to a branch condition that was not taken in the parent input. For instance, let's suppose we start with only one seed. When we run the program using this seed as input, the program will manipulate the bytes from the input and take decisions based on them. That is, it will make checks using if statements, and depending on the result, it will take the then or the else branch. TritonDSE collects all those branches and negates them to generate an input that exercises the opposite direction (if in the original input, a then branch was taken, in the derived seed generated by TritonDSE the else branch will be taken). There will be branches for which it is not feasible to yield the opposite result due to contradictory restrictions. This way, and by repeating this process (that is, retro-feeding the newly generated inputs) TritonDSE can explore a program. Therefore, you can use TritonDSE to explore a program to help you in your vulnerability research tasks. You can combine this exploration with classic fuzzing tools, such as AFL++ and Honggfuzz, to improve your results.

Moreover, TritonDSE implements different coverage strategies, like Block, Edge, or Path. These strategies allow the user to customize the exploration, providing a balance between accuracy and speed. Block is the most basic coverage strategy. A basic block is considered covered simply if it is executed (that is, if TritonDSE manages to generate an input that exercises that particular basic block). On the other hand, Edge considers both the source and destination of a branch. Therefore, if a basic block can be reached from multiple locations, it will be marked as covered only when all pairs of source-destination were covered. Finally, Path considers all the possible ways to get to a given point in a program and this point will be considered covered when all of them have been executed.

To summarize, TritonDSE not only provides great binary program analysis capabilities right away, but it is also designed to be highly customizable and easy to use.

Quick Example

Let's use a simple crackme, shown below, to display TritonDSE's basic program exploration features:

#include <stdio.h>
#include <stdlib.h>

char *serial = "\x06\x24\x3d\x26\x3b\x38\x16\x07\x11";

int check_input(char *input) {
    int i = 0;

    while (i < 9) {
        if (((input[i] - 1) ^ 0x55) != serial[i])
            return 1;
        i++;
    }

    return 0;
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        return -1;
    }

    if (!check_input(argv[1])) {
        printf("Win\n");
    }

    return 0;
}

This program receives input from the command line through argv. When provided with the correct input, it will display Win.

To automatically solve this crackme, we use the following script:

import logging

from tritondse import CompositeData
from tritondse import Config
from tritondse import CoverageStrategy
from tritondse import ProcessState
from tritondse import Program
from tritondse import Seed
from tritondse import SeedFormat
from tritondse import SymbolicExecutor
from tritondse import SymbolicExplorator


logging.basicConfig(level=logging.INFO)


def pre_exec_hook(se: SymbolicExecutor, state: ProcessState):
    logging.info(f"[PRE-EXEC] Processing seed: {se.seed.hash}, \
                    ({repr(se.seed.content.argv)})")


# Load the program (LIEF-based program loader).
prog = Program("./crackme")

# Load the configuration.
config = Config(coverage_strategy=CoverageStrategy.PATH, debug=True,
                pipe_stdout=True, seed_format=SeedFormat.COMPOSITE)

# Create an instance of the Symbolic Explorator
dse = SymbolicExplorator(config, prog)

# Create a starting seed, representing argv.
seed = Seed(CompositeData(argv=[b"./crackme", b"AAAAAAAAAAAAAAA"]))

# Add seed to the worklist.
dse.add_input_seed(seed)

# Add callbacks.
dse.callback_manager.register_pre_execution_callback(pre_exec_hook)

# Start exploration!
dse.explore()

This script will execute the target symbolically starting with AAAAAAAAAAAAAAA as input. It will collect the branches that depend on the input, invert them, and produce a new input, which will be added to the corpus. It will repeat this process until it can no longer yield an input that covers new code.

The code is straightforward. It loads the program and sets the configuration for the SymbolicExplorator. Then, it creates a seed and adds it to the corpus. There are two types of seeds: Composite and Raw. The first allows the user to fine-tune the input to inject. In this case, it allows the specification of the value of argv (it can also be used to specify files and variables). The Raw format, as expected, is just a sequence of bytes that are directly passed to the program (useful in cases where the program reads from stdin). Notice that we also make use of the hooking mechanism. Here we use it to display the seed hash and its content just before the program starts (you can read more about hooks here). Another point to notice is that we have not set up a hook on printf, TritonDSE does it for us, as it comes with support for basic libc functions.

The following is a snippet of the output. Notice the two new inputs generated (using the Z3 SMT solver).

...
INFO:root:Starting emulation
INFO:root:[PRE-EXEC] Processing seed: e2f673d0fd7980a2bdad7910f0f6da7a, ([b'./crackme', b'AAAAAAAAAAAAAAA'])
INFO:root:configure pstate: time_inc:1e-05  solver:Z3  timeout:5000
INFO:root:hit 0x1085: hlt instruction stop.
INFO:root:Emulation done [ret:0]  (time:0.01s)
INFO:root:Instructions executed: 59  symbolic branches: 1
INFO:root:Memory usage: 113.93Mb
INFO:root:Seed e2f673d0fd7980a2bdad7910f0f6da7a generate new coverage
INFO:root:pc:0/1 | Query n°1, solve:4efcfc1fc8 (time: 0.02s) [SAT]
INFO:root:New seed model a69a64322c94c4f52f5679145e478f0a_0064_CC_4efcfc1fc8.tritondse.cov dumped [NEW]
INFO:root:Corpus:1 Crash:0
INFO:root:Seed Scheduler: worklist:1 Coverage objectives:1  (fresh:0)
INFO:root:Coverage instruction:59 covitem:1
INFO:root:Emulation: 0m0s | Solving: 0m0s | Elapsed: 0m0s
...

A few lines below we can see how it generates the input that solves the crackme:

...
INFO:root:Pick-up seed: a54a3bd5261e4cab786836561fece562_0064_CC_95abb74fac.tritondse.cov (fresh: False)
INFO:root:Initialize ProcessState with thread scheduling: 200
INFO:root:Starting emulation
INFO:root:[PRE-EXEC] Processing seed: a54a3bd5261e4cab786836561fece562, ([b'./crackme', b'TritonDSEAAAAAA'])
INFO:root:configure pstate: time_inc:1e-05  solver:Z3  timeout:5000
Win
...

This was just a simple example of how to load and explore a program very intuitively and in just a couple of lines of code. TritonDSE can load and handle complex binaries and handle x86/x86_64 and ARM32 architectures. Currently, it is used a whitebox fuzzer integrated into PASTIS.

Documentation

TritonDSE is well documented, here you will find how to get started, the basic Python API and the advanced one, and even exercises that will let you get familiar with its concepts, which type of problems can be solved and how to solve them. There are Jupyter Notebooks as well.

Conclusion

In this blog post, we presented TritonDSE v0.1.2, a Python library providing exploration capabilities for binary programs. This is one of the many projects that we developed in Quarkslab as part of our efforts to improve and ease our daily tasks on binary analysis and vulnerability research. We are now glad to open-source it so others can benefit from it as well.

Stay tuned for more news on TritonDSE!

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

Table of contents

Introduction

Overview

Quick Example

Documentation

Conclusion