Slaying Dragons with QBDI

This article aims to present a simple use of our Dynamic Binary Instrumentation framework QBDI which has recently been publicly released following a talk at 34C3. We will resolve, step by step, a CTF challenge by analyzing an obfuscated binary using QBDI, thus showcasing some of the nice features it offers. This blog post was written last year during my internship at Quarkslab, where I discovered the wonderful (but not so simple) world of Dynamic Binary Instrumentation.

Dynamic Binary Instrumentation

Dynamic Binary Instrumentation (DBI) is a way to analyze a running binary through the use of injected instrumentation code. It has many use cases like, performance analysis, deobfuscation / unpacking, binary tracing and more! But it can be quite difficult to find a simple yet interesting example to demonstrate its use. This article showcases QBDI and its Frida bindings by solving a challenge from CSAW CTF 2015 called Wyvern 500. There are already a lot of good writeups of this challenge using other DBI tools, mainly because it is a great application example .

Wyvern 500

We first run the binary to get an overall idea of what this crackme looks like:

+-----------------------+
|    Welcome Hero       |
+-----------------------+

[!] Quest: there is a dragon prowling the domain.
    brute strength and magic is our only hope. Test your skill.

Enter the dragon's secret: ?

[-] You have failed. The dragon's power, speed and intelligence were greater.

As expected, it is asking for a password that we do not have. Our goal is to avoid performing a full reverse engineering of its implementation. So the only disassembly we will look at must help us to have a global vision of the binary, and especially to identify the input function. We want to understand the binary just enough to be able to forge and inject passwords while instrumenting it.

Wyvern Overview

00000000040e120 <main>:
  40e120:   55                      push   rbp
  40e121:   48 89 e5                mov    rbp,rsp
  40e124:   48 81 ec c0 01 00 00    sub    rsp,0x1c0
  40e12b:   c7 45 fc 00 00 00 00    mov    DWORD PTR [rbp-0x4],0x0
  40e132:   b8 e0 01 61 00          mov    eax,0x6101e0
  40e137:   89 c1                   mov    ecx,eax
  40e139:   b8 7c e6 40 00          mov    eax,0x40e67c
  40e13e:   89 c6                   mov    esi,eax
  40e140:   48 89 cf                mov    rdi,rcx
  40e143:   48 89 8d b0 fe ff ff    mov    QWORD PTR [rbp-0x150],rcx
  40e14a:   e8 21 2e ff ff          call   400f70 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>

  [...] Redacted [...]

  40e1d1:   48 8b 15 e8 1f 20 00    mov    rdx,QWORD PTR [rip+0x201fe8]        # 6101c0 <stdin@@GLIBC_2.2.5>
  40e1d8:   48 8d 8d f0 fe ff ff    lea    rcx,[rbp-0x110]
  40e1df:   be 01 01 00 00          mov    esi,0x101
  40e1e4:   48 89 cf                mov    rdi,rcx
  40e1e7:   48 89 85 80 fe ff ff    mov    QWORD PTR [rbp-0x180],rax
  40e1ee:   48 89 8d 78 fe ff ff    mov    QWORD PTR [rbp-0x188],rcx
  40e1f5:   e8 46 2d ff ff          call   400f40 <fgets@plt>

  [...] Redacted [...]

This is the disassembly of the first instructions of the Wyvern main() down to the fgets() call. We don't need much than that to solve the challenge. Now that we are aware that the input comes from a simple fgets(), we can proceed with the instrumentation!

Slay the Dragon with QBDI

We only need one of the most basic features of a DBI to solve the challenge: tracing the executed basic blocks. It is usually done to analyze performance and help identifying pieces of code that need to be fixed in order to improve the software responsiveness. But this is not the reason why we monitor the basic blocks here: As experimented by Jonathan Salwan a few years ago, we can indeed solve some challenges by counting the number of instructions executed (or in our case basic blocks, as this is enough here and will make the analysis significantly faster).

How are we supposed to do this using QBDI? One of the easiest ways to achieve this is using our user-friendly bindings for Frida. The combination of Frida and QBDI results in a scriptable and granular instrumentation allowing us to easily control the execution and quickly count instructions or basic blocks.

Frida and QBDI: a Great Combo

To make things simple, we use Frida in order to inject QBDI into a running process and orchestrate the instrumentation ran by QBDI. To solve the crackme, we brute-force the password character by character, counting basic blocks each time we try a new password. We iterate over the first characters until a new path is discovered (which translates to more basic blocks being executed). And as soon as a new path is taken, we will iterate the over second character and so on and so forth until all characters are discovered in a typical Hollywood H4ck3r style!

This part describes the code that will be running inside the binary to instrument it.

First of all, we will hook the main function with Frida that will then be instrumented by QBDI.

var MainAddress = DebugSymbol.fromName("main").address

Interceptor.attach(MainAddress, {
    onEnter: function(args){
        // Detach interceptor so QBDI will not instrument Frida hook
        Interceptor.detachAll()
        // Run SolveWyvern()
        SolveWyvern()
        // We will never execute the native code, everything goes through the DBI
        WaitForever = recv('Wait', null)
        WaitForever.wait()
    },
});

Now we have a small script that lets us instrument the main function, but before diving into the SolveWyvern() function, which is the real payload here, let's do some fixes to allow us to send the input easily.

We know from earlier that the program calls fgets() to get its input from the user. We will patch fgets() with NOPs and fill the input buffer that is supposed to hold the password. We can do this using Frida recv() function and store the freshly acquired password inside the destination buffer.

// QBDI
const qbdi = require('/usr/local/share/qbdi/frida-qbdi.js'); // import QBDI bindings
qbdi.import(); // Set bindings to global environment

// Get main address from debug symbols
var mainAddress = DebugSymbol.fromName("main").address
// call to fgets, in the main function
var fgetsAddressUsage = mainAddress.add(0xd5);

// Hook main using Frida
Interceptor.attach(mainAddress, {
    onEnter: function(args){
        // Detach interceptor so QBDI will not instrument Frida hook
        Interceptor.detachAll()
        // Nop the fgets call, so we can manually fill the buffer later
        Memory.protect(fgetsAddressUsage, 0x5, "rwx");
        Memory.writeByteArray(fgetsAddressUsage, [0x90, 0x90, 0x90, 0x90, 0x90]);
        // Run SolveWyvern()
        Password = null
        RecvPass = recv('Password', function onMessage(Pass) { Password = Pass["payload"] });
        RecvPass.wait();
        SolveWyvern(Password)
        // We will never execute the native code, everything goes through the DBI
        WaitForever = recv('Wait', null)
        WaitForever.wait()
    },

});

What about SolveWyvern()? This function is the instrumentation part and will deal with QBDI. First of all, we need to instantiate a new QBDI virtual machine. We then decide to instrument only the Wyvern binary (and not other libraries it may use).

We set a first callback on the old location of fgets() we just NOPped. As soon as the instrumentation is supposed to execute the instruction located at this address, it will input the password received from the host controller (a python script that will be detailed later). During this instrumentation phase, we pick up the buffer location in RDI (which was supposed to be the first argument of fgets()) and write the password directly to the location pointed by RDI!

The second callback is called each time a basic block is executed and just increases a counter.

Now that everything is set up we can call the main function, wait until the end and send the number of basic blocks executed to the control script.

function SolveWyvern(Password){
    var userData = {Counter: 0};
    // Initialize QBDI
    var vm = new QBDI();
    var state = vm.getGPRState();
    var stack = vm.allocateVirtualStack(state, 0x100000);
    // Instrument wyvern only
    vm.addInstrumentedModule("bin");

    // This callback is used to count the number of basic blocks executed
    var BasicBlockCallback = vm.newVMCallback(function(vm, evt, gpr, fpr, data) {
        data.Counter++;
        return VMAction.CONTINUE;
    });
    vm.addVMEventCB(VMEvent.BASIC_BLOCK_ENTRY, BasicBlockCallback, userData);

    var PatchFGetsCB = vm.newInstCallback(function(vm, gpr, fpr, data) {
        // Get buffer passed via RDI and write to it instead of using fgets, so we can save time
        var Buff = gpr.getRegister("RDI");
        Memory.writeUtf8String(Buff, Password);
        return VMAction.CONTINUE;
    });
    vm.addCodeAddrCB(fgetsAddressUsage, InstPosition.PREINST, PatchFGetsCB, NULL);

    // call main until after the check has been performed
    vm.run(mainAddress, 0x40E29C);
    // Send BB executed
    send(userData.Counter);
    userData.Counter = 0;
}

Drive the Instrumentation with Python

Frida has Python bindings that are really useful to control an application without using the Frida REPL and automatize repetitive actions (like solving this challenge). We first talk about the way we can test a single password sent from a Python script.

Frida can spawn processes in a suspended state and let us load Javascript code (as the one we described earlier) in the process, ready to instrument the binary. In order to import QBDI in our instrumentation script (or any other nodejs module), we need to compile it:

ax@Axi0mS:~/r3k1 $ frida-compile Wyvern.js -o Compiled.js

Now that we are ready to inject our script, let's do it with this Python snippet (which is a pretty standard usage of Frida):

import frida
# Compile your instrumentation script using Frida: `frida-compile Wyvern.js -o Compiled.js`
f = open("Compiled.js", "r")
Script_JS = f.read()
f.close()

# On message callback, compare the current number of executed basic blocks. Update MaxBB if higher.
def onMessage(message, data):
    print("Number of basic blocks executed: ", message["payload"])
    print("Press a key to exit")

if __name__ == '__main__':

    Password = "Quarkslab\n"
    # Spawn process and init
    PID = frida.spawn(("./bin", ""))
    session = frida.attach(PID)
    # enable jit to make the execution faster
    session.enable_jit()
    script = session.create_script(Script_JS)
    # define callback
    script.on('message', onMessage)
    # load and resume
    script.load()
    frida.resume(PID)
    script.post({'type': 'Password', 'payload': Password})
    input()
    frida.kill(PID)
    frida.shutdown()

A closer look reveals that Frida not only takes care of the loading of the instrumentation script, but also of the communication between our control script and the instrumented remote process.

And here we go:

ax@ax:~/r3k1 $ python3.6 Phase1.py
Number of basic blocks executed:  2979
Press a key to exit

This allows us to test a password directly and check the number of basic blocks executed. We can use this to guess the size of the password by modifying the script a little bit:

import frida
import threading
# Compile your instrumentation script using Frida: `frida-compile Wyvern.js -o Compiled.js`
f = open("Compiled.js", "r")
Script_JS = f.read()
f.close()

eventCbk = threading.Event()
# On message callback, compare the current number of executed basic blocks. Update MaxBB if higher.
def onMessage(message, data):
    print("Number of basic blocks executed: ", message["payload"])
    eventCbk.set()

if __name__ == '__main__':
    for i in range (0,40):
        Password = "?"*i+"\n"
        # Spawn process and init
        PID = frida.spawn(("./bin", ""))
        session = frida.attach(PID)
        # enable jit to make the execution faster
        session.enable_jit()
        script = session.create_script(Script_JS)
        # define callback
        script.on('message', onMessage)
        # load and resume
        script.load()
        frida.resume(PID)
        print("[-] Testing password of size: ", len(Password[:-1]), "\t", end='')
        script.post({'type': 'Password', 'payload': Password})
        eventCbk.wait()
        eventCbk.clear()
        frida.kill(PID)
    frida.shutdown()

Which outputs the following:

ax@ax:~/r3k1 $ python3 Phase2.py
[-] Testing password of size:  0    Number of basic blocks executed:  2979
[-] Testing password of size:  1    Number of basic blocks executed:  2979
[REDACTED]
[-] Testing password of size:  27   Number of basic blocks executed:  2979
[-] Testing password of size:  28   Number of basic blocks executed:  3829
[-] Testing password of size:  29   Number of basic blocks executed:  2979
[REDACTED]
[-] Testing password of size:  38   Number of basic blocks executed:  2979
[-] Testing password of size:  39   Number of basic blocks executed:  2979

As you can see, when we are trying a 28 characters long string, we are instrumenting more basic blocks meaning we guessed the size! Now we can do the same checking for each character of the password!

Bruteforce Optimizations

As you may know, spawning a new process consumes a lot of resources. We can apply a trick that allows to reuse the same process multiple times, since the native code is never executed. Thanks to QBDI virtual machine we can call main an infinite number of times without problems. Well, actually, there is a specific array that needs to be reset between each test to be successful. So we just keep a copy of it, and rewrite it when we are done instrumenting.

Note: this array has been found by monitoring memory accesses performed during the password validation. This can be achieved very easily and efficiently using QBDI fast memory accesses recorder, which allows to analyze values read from / written to memory after each basic blocks execution.

// QBDI
const qbdi = require('/usr/local/share/qbdi/frida-qbdi.js'); // import QBDI bindings
qbdi.import(); // Set bindings to global environment

// Get main address from debug symbols
var mainAddress = DebugSymbol.fromName("main").address
// call to fgets, in the main function
var fgetsAddressUsage = mainAddress.add(0xd5);

// Hook main using Frida
Interceptor.attach(mainAddress, {
    onEnter: function(args){
        // Detach interceptor so QBDI will not instrument Frida hook
        Interceptor.detachAll()
        // Nop the fgets call, so we can manually fill the buffer later
        Memory.protect(fgetsAddressUsage, 0x5, "rwx");
        Memory.writeByteArray(fgetsAddressUsage, [0x90, 0x90, 0x90, 0x90, 0x90]);
        // Run SolveWyvern()
        SolveWyvern()
        // We will never execute the native code, everything goes through the DBI
        WaitForever = recv('Wait', null)
        WaitForever.wait()
    },

});


function SolveWyvern(){
    // Initialize QBDI
    var vm = new QBDI();
    var state = vm.getGPRState();
    var stack = vm.allocateVirtualStack(state, 0x100000);
    // Instrument wyvern only
    vm.addInstrumentedModule("bin");
    var userData = {Counter:0};

    // This callback is used to count the number of basic blocks executed
    var BasicBlockCallback = vm.newVMCallback(function(vm, evt, gpr, fpr, data) {
        data.Counter++;
        return VMAction.CONTINUE;
    });
    vm.addVMEventCB(VMEvent.BASIC_BLOCK_ENTRY, BasicBlockCallback, userData);

    var PatchFGetsCB = vm.newInstCallback(function(vm, gpr, fpr, data) {
        // Get buffer passed via RDI and write to it instead of using fgets, so we can save time
        var Buff = gpr.getRegister("RDI");
        Memory.writeUtf8String(Buff, Password);
        return VMAction.CONTINUE;
    });
    vm.addCodeAddrCB(fgetsAddressUsage, InstPosition.PREINST, PatchFGetsCB, NULL);


    originalDataPtr = ptr(0x6102F8);
    // This trick allows us to reuse the same process multiple time, by saving the state of this array, and restoring it after each try
    originalData = Memory.readByteArray(originalDataPtr, 30);
    // We will kill the process when we are done with it
    while (true){
        var Password = null;
        // Get password from python
        data = recv('Password', function onMessage(Pass) { Password = Pass["payload"] });
        data.wait();
        // Run until the check has been performed (we do not need to go further)
        vm.run(mainAddress, 0x40E29C);
        // write the array back
        Memory.writeByteArray(originalDataPtr, originalData);
        send(userData.Counter);
        userData.Counter = 0;
    }
}

We can then apply some fixes to the Python code and find the password without reversing all the binary! What we need to do is simply iterating each password character over a charset, and move on to the next password character when the number of basic blocks executed is higher.

import frida
import threading
# Compile your instrumentation script using Frida: `frida-compile Wyvern.js -o Compiled.js`
f = open("Compiled.js", "r")
Script_JS = f.read()
f.close()

MAX_LENGTH = 128
MaxBB = 0
Charset = "_0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

eventCbk = threading.Event()
# On message callback, compare the current number of executed basic blocks. Update MaxBB if higher.
def onMessage(message, data):
    global MaxBB
    CurrentBB = int(message["payload"])
    if CurrentBB > MaxBB:
        MaxBB = CurrentBB
    eventCbk.set()


def WyvernSpawner():
    # Spawn process and init
    PID = frida.spawn(("./bin", ""))
    session = frida.attach(PID)
    # enable jit to make the execution faster
    session.enable_jit()
    script = session.create_script(Script_JS)
    # define callback
    script.on('message', onMessage)
    # load and resume
    script.load()
    return PID, script


def TestPassword(script, Password):
    global MaxBB
    CurrentBB = MaxBB
    script.post({'type': 'Password', 'payload': Password})
    eventCbk.wait()
    eventCbk.clear()
    return CurrentBB != MaxBB

if __name__ == '__main__':
    PID, script = WyvernSpawner()
    frida.resume(PID)
    TestPassword(script, "InitMaxBB\n")
    for i in range (0,MAX_LENGTH):
        Password = "?"*i+"\n"
        print("\r[-] Testing password of size: ", len(Password[:-1]), "\t", end='')
        if TestPassword(script, Password):
            Length = len(Password[:-1])
            print("\n[+] Password size is: ", Length)
            break

    Tries = 0
    Password = ["?"]*Length + ["\n"]

    # bruteforce password, character by character based on a reduced charset
    for i in range(0, Length):
        for j in Charset:
            Tries += 1
            Password[i] = j
            PasswordStr = "".join(Password)
            print("\r[-] Testing: ", PasswordStr[:-1], end='')
            if TestPassword(script, PasswordStr):
                break

    print("\r[+] Password is: ", PasswordStr, "\nCracked in %i attempts" % (Tries))


    frida.kill(PID)
    frida.shutdown()
ax@ax:~/r3k1 $ python3 Phase3.py
[-] Testing password of size:  28
[+] Password size is:  28
[+] Password is:  dr4g0n_or_p4tric1an_it5_LLVM

Cracked in 586 attempts

You can even see how efficient it is watching the following animation!

Conclusion

By combining the powers of QBDI and Frida, and with a bit of scripting and using only basic blocks counting, we managed to recover the password on this specific challenge.

We showcased our Frida bindings, but we also have Python bindings , and a lot more to discover... Give QBDI a try!

Thanks to Charles and C├ędric, who gave me the chance to work on QBDI and thanks to all QuarksLab colleagues who proofread this article.

Comments