Posted Tue 31 March 2026
Authors Laurent Laubin, Sami Babigeon, Christian Heitman
Category Reverse-Engineering
Tags reverse-engineering, QBDI, Triton, symbolic execution, ctf, instrumentation, 2026

In this blog, we present how QBDI and TritonDSE can be used to attack a complex C++ binary implementing a VM.

Introduction

The 3rd edition of the Jeanne d'Hack CTF took place on January 30–31, 2026, and I had the opportunity to beta test the reverse challenges. Challenges are based on a main binary, providing a kind of API to interact with the user (print content on screen, ask for question depending various choices, etc...), in an interactive console-mode game. Each level comes as a separate library. This blogpost will present two solutions, one with TritonDSE and the other with QBDI, to solve the last level (level_4.so), which implement a custom VM.

     +==============================================================================+
     | The Dragon's Lair                                                            |
     +==============================================================================+
     |                                                |===-~___               _,-'  |
     |                 -==\\                         `//~\\   ~~~~`---.___.-~'      |
     |             ______-==|                         | |  \\           _-~`        |
     |       __--~~~  ,-/-==\\                        | |   `\        ,'            |
     |    _-~       /'    |  \\                      / /      \      /              |
     |  .'        /       |   \\                   /' /        \   /'               |
     | /  ____  /         |    \`\.__/-~~ ~ \ _ _/'  /          \/'                 |
     |/-'~    ~~~~~---__  |     ~-/~         ( )   /'        _--~`                  |
     |                  \_|      /        _)   ;  ),   __--~~                       |
     |                    '~~--_/      _-~/-  / \   '-~ \                           |
     |                   {\__--_/}    / \\_&gt;- )&lt;__\      \                    |
     |                   /'   (_/  _-~  | |__&gt;--&lt;__|      |                   |
     |                  |0  0 _/) )-~     | |__&gt;--&lt;__|     |                  |
     |                  / /~ ,_/       / /__&gt;---&lt;__/      |                   |
     |                 o o _//        /-~_&gt;---&lt;__-~      /                    |
     |                 (^(~          /~_&gt;---&lt;__-      _-~                     |
     |                              /__&gt;--&lt;__/     _-~                        |
     |                             |__&gt;--&lt;__|     /                  .---_    |
     |                             |__&gt;--&lt;__|    |                 /' _--_~\  |
     |                             |__&gt;--&lt;__|    |               /'  /    ~\`\|
     |                              \__&gt;--&lt;__\    \            /'  //       |||
     |                               ~-__&gt;--&lt;_~-_  ~--____---~' _/'/       /' |
     |                                  ~-_~&gt;--&lt;_/-__       __-~ _/           |
     |                                     ~~-'_/_/ /~~~~~~~__--~                   |
     |                                            ~~~~~~~~~~                        |
     |                                                                              |
     +==============================================================================+
     | You venture deeper into the cave system. The air grows warmer                |
     | with each step, and a faint red glow appears ahead.                          |
     |                                                                              |
     | As you round a corner, the tunnel opens into a massive cavern.               |
     | Your heart stops. In the center lies an enormous dragon,                     |
     | its golden scales reflecting the dim light.                                  |
     |                                                                              |
     +==============================================================================+
     | [0] Try to sneak past quietly.                                               |
     | [1] Attack while it's sleeping.                                              |
     | [2] Back away slowly.                                                        |
     +------------------------------------------------------------------------------+

Analysis

After playing the fourth level a few times, it appears that we always die when facing the dragon. There aren't many options, and all our actions seem to lead to the same result. Let's open this library in a disassembler. Looking at the xref to windows_frame brings us to one strange thing:

.text:000000000000C93B                 call    _create_choices
.text:000000000000C940                 mov     [rbp+var_18], rax
.text:000000000000C944                 mov     rax, [rbp+var_18]
.text:000000000000C948                 lea     rdx, aStrikeAtItsHea ; "Strike at its heart."
.text:000000000000C94F                 mov     rsi, rdx
.text:000000000000C952                 mov     rdi, rax
.text:000000000000C955                 call    _choices_add
.text:000000000000C95A                 mov     rax, [rbp+var_18]
.text:000000000000C95E                 lea     rdx, aAimForItsEyes ; "Aim for its eyes."
.text:000000000000C965                 mov     rsi, rdx
.text:000000000000C968                 mov     rdi, rax
.text:000000000000C96B                 call    _choices_add
.text:000000000000C970                 mov     rax, [rbp+var_18]
.text:000000000000C974                 lea     rdx, aLookForAWeakne ; "Look for a weakness."
.text:000000000000C97B                 mov     rsi, rdx
.text:000000000000C97E                 mov     rdi, rax
.text:000000000000C981                 call    _choices_add
.text:000000000000C986                 mov     rax, [rbp+var_48]
.text:000000000000C98A                 mov     rdi, rax
.text:000000000000C98D                 call    sub_C658
.text:000000000000C992                 test    eax, eax
.text:000000000000C994                 setz    al
.text:000000000000C997                 test    al, al
.text:000000000000C999                 jz      short loc_C9B1
.text:000000000000C99B                 mov     rax, [rbp+var_18]
.text:000000000000C99F                 lea     rdx, aUseTheThuUmThe ; "Use the Thu'um - the Voice."
.text:000000000000C9A6                 mov     rsi, rdx
.text:000000000000C9A9                 mov     rdi, rax
.text:000000000000C9AC                 call    _choices_add
.text:000000000000C9B1
.text:000000000000C9B1 loc_C9B1

While playing, I never managed to see the fourth choice. Obviously, there is a constraint checked in sub_C658. We could quickly take a look at it (spoiler: it checks that the player's save file contains JDHACK), but it would be better to avoid losing time; after all, it's a CTF, man! Let's just patch the library to force the fourth choice to be available:

.text:000000000000C6A2                 cmp     dl, al
.text:000000000000C6A4                 jz      short loc_C6AD
.text:000000000000C6A6                 mov     eax, 1           ; we will replace this 1 by 0
.text:000000000000C6AB                 jmp     short loc_C6BC

With this patch, the hidden choice becomes available and brings us to an input box where we are asked to enter words:

+------------------------------------------------------------------+
|Choose your words carefully Dovah                                 |  
|> abcdefghijklm                                                   | 
+------------------------------------------------------------------+

Of course, if we enter some random content, we are returned that Language is Knowledge, Knowledge is Power.. It's worth noting that this string is not in the binary; maybe it's obfuscated or something else...?

Looking at the disassembly, we quickly identified that the fourth choice brings us into sub_C6BE. This function starts by initializing a memfile stream on the content of a std::string (initialized as global from init_array, using content at lib_base_address+0x1B8B8 ) which looks like ... some blah blah from Dragons ?

>>> # content extracted from level_4.so+0x1B8B8
>>> blahblah = """
Dah Osos Ruvaak Oblaan Dah Osos Ruvaak ...  Suleyk Osos Onik Oblaan 
"""
>> words = blahblah.split(' ')
>>> set(words)
{'Sahqon', 'Uth', 'Daal', 'Staadnau', 'Osos', "Zu'u", 'Nol', 'Vahrukt', 'Werid', 'Gahrot', 'Tah', 'Ruvaak', 'Evgir', 'Oblaan', 'Onik', 'Bodiis', 'Hadrim', "Thu'um", 'Ul', 'Thur', 'Dein', 'Dah', 'Feim', 'Nahlot', 'Suleyk', 'Bormah', 'Dinok', 'Qethsegol', 'Ahst', 'Ahrk'}>
>> len(set(words))
30

Another interesting thing that triggered my curiosity was the various xref to strings like yy_get_next_buffer, yy_scan_bytes, etc . Most of them are referenced in a big switch case contained in sub_17878. This definetely looks like a flex parser, which can also be confirmed with some error strings...

$ strings ./level_4.so | grep -Ei "flex|yy_|yytext|yytname|fatal"
fatal flex scanner internal error--no action found
fatal flex scanner internal error--end of buffer missed
fatal error - scanner input buffer overflow
input in flex scanner failed
out of dynamic memory in yy_get_next_buffer()
flex scanner push-back overflow
out of dynamic memory in yy_create_buffer()
out of dynamic memory in yy_scan_buffer()
out of dynamic memory in yy_scan_bytes()
bad buffer in yy_scan_bytes()

At first, I thought the next step would be to dig into this parser, which, honestly, did not really appeal to me... Playing with some GDB sessions, I quickly realized that it just takes the dragon's blah blah, which is a static content, to initialize the content of a buffer (in fact a std::vector). So there is absolutely no need to look at this parser, the real deal appears to be in the function called just after (sub_D434). Once again, using a GDB session, we can confirm that all the logic appends in this function, which asks for the flag (Choose your words carefully Dovah), validates the input, and shows Language is Knowledge, Knowledge is Power if it's invalid.

Basically, the function sub_D434 is just a loop, calling a much more interesting function: sub_D5FA. The call graph for this function is very typical of a VM opcode dispatcher:

VM dispatcher

Before digging in the various handlers to reverse this VM, let's try to find where the input is read. We can suppose the API exposed by the main binary will still be used, so let's put a break on the read syscall:

gef> catch syscall read
...
[#0] Id 1, Name: "jdhack-rpg", stopped 0x475b0d in ?? (), reason: BREAKPOINT
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  bt
#0  0x0000000000475b0d in ?? () ; jdhack-rpg
#1  0x0000000000407099 in ?? ()
#2  0x00000000004074b7 in ?? ()
#3  0x0000000000407688 in ?? ()
#4  0x0000000000403664 in ?? ()
#5  0x00007ffff7fdb70f in ?? () ; level_4.so+0xE70F 
...

.text:000000000000E65C ; __int64 __fastcall sub_E65C(_QWORD, _QWORD)
.text:000000000000E65C sub_E65C        proc near               ; CODE XREF: sub_D5FA+58A↑p
...
.text:000000000000E70A                 call    _window_prompt           ; <--- read syscall breakpoint is triggered somewhere here      
.text:000000000000E70F                 mov     [rbp+var_20], rax

This confirms what we supposed: the flag reading, which is done in sub_E65C, is called from the dispatcher we identified in sub_D5FA. Let's put a breakpoint at level_4.so+0xE70F:

   0x7ffff7fdb702                  call   0x7ffff7fd8430
   0x7ffff7fdb707                  mov    rdi, rax
   0x7ffff7fdb70a                  call   0x7ffff7fd8d40
●→ 0x7ffff7fdb70f                  mov    QWORD PTR [rbp-0x20], rax
   0x7ffff7fdb713                  mov    rax, QWORD PTR [rbp-0x20]
   0x7ffff7fdb717                  mov    QWORD PTR [rbp-0x18], rax
   0x7ffff7fdb71b                  jmp    0x7ffff7fdb738
   0x7ffff7fdb71d                  mov    rax, QWORD PTR [rbp-0x18]
   0x7ffff7fdb721                  mov    rsi, rax
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "jdhack-rpg", stopped 0x7ffff7fdb70f in ?? (), reason: BREAKPOINT
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x7ffff7fdb70f → mov QWORD PTR [rbp-0x20], rax
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  hexdump byte $rax
0x000000000050d100     73 6f 6d 65 20 62 6c 61 68 20 66 6f 72 20 64 72    some blah for dr
0x000000000050d110     61 67 6f 6e 73 00 00 00 00 00 00 00 00 00 00 00    agons...........

In this type of problem, it is often useful to know how the input is used, and symbolic execution can be very helpful in doing so.

TritonDSE emulation of the VM coredump

TritonDSE is a Python library, built on top of Triton, that provides easy DSE capabilities for binary programs.

In our context, the binary is an ELF x86_64 library, importing functions from the main binary that loads it. The library is also linked to libstdc++, importing various C++ components, and TritonDSE does not provide emulation routines for C++. So how to deal with this?

To that end, let me introduce you to a new feature recently merged in TritonDSE: starting emulation from a GDB coredump \o/.

In our context, we will start the emulation from a coredump generated just after the input of the flag, so exactly where we put our last breakpoint in level_4.so+0xE70F.

small note: to avoid losing time trying to understand why the coredump looks buggy and misses some memory pages, don't forget to read the manpages :rage: By default, GDB will honor the VM_DONTDUMP flag, see the man page. You can for example change this behavious with :

gef➤ !`echo 0xF > /proc/$(pidof target_binary)$/coredump_filter`
gef➤ generate-core-file jdhack_level4_before_startvm.dump

Starting the emulation in TritonDSE from the coredump is quite easy. Basically, we just have to use the new CoredumpLoader, setup some initialization stuff, add a hook to dump instructions - mainly to verify emulation is ok -, symbolize the input we want to track, and start the emulation. This is as easy as this small snippet of code:

from tritondse import CoredumpLoader
from tritondse import logging
from tritondse import SymbolicExecutor
from tritondse import ProcessState
from tritondse import Config
from tritondse import Seed
from tritondse import CompositeData
from triton import Instruction

def trace_inst(se: SymbolicExecutor, pstate: ProcessState, inst: Instruction):
     # We stop emulation when the loop function (sub_D434) exits
     if inst.getAddress()==BASEADR+0xC752: 
         se.abort() 
     # we dump address and instruction, and we also indicate with [S] if the instruction manipulates symbolized data
     print(f"0x{inst.getAddress():x}: {inst.getDisassembly()} {'[S]' if inst.isSymbolized() else ''}")

p = CoredumpLoader("./jdhack_level4_after_windowprompt.dump")

config = Config(workspace="ws", workspace_reset = True)
seed = Seed(CompositeData(variables={"flag":b"ABCDEFGHIJKLMNOPQRSTUVWXYZ"}))   

executor = SymbolicExecutor(config, seed)
executor.load(p)

# we symbolize the input buffer in RAX, with a flag that should not contain any valid chars from the flag
# remember it is read from stdin, we can expect values between 0x20 and 0x7F, or eventually linefeed or tab ...
executor.inject_symbolic_variable_memory(executor.pstate.cpu.rax,"flag",b"\x1F"*16)

executor.cbm.register_post_instruction_callback(trace_inst)

executor.run()

The emulation runs quite smoothly, as the coredump contains all the code for the various dependency libraries. For example, we immediately reach some code manipulating the input:

0x7ffff7fdb70f: mov qword ptr [rbp - 0x20], rax   ; rax = flag
0x7ffff7fdb713: mov rax, qword ptr [rbp - 0x20] 
0x7ffff7fdb717: mov qword ptr [rbp - 0x18], rax 
0x7ffff7fdb71b: jmp 0x7ffff7fdb738 
0x7ffff7fdb738: mov rax, qword ptr [rbp - 0x18] 
0x7ffff7fdb73c: movzx eax, byte ptr [rax] [S]     ; check if the first char of the flag is a null byte
0x7ffff7fdb73f: test al, al [S]
0x7ffff7fdb741: jne 0x7ffff7fdb71d [S]            
0x7ffff7fdb71d: mov rax, qword ptr [rbp - 0x18] 
...

But after a lot of instructions, more than 1.5 millions, we reach the following error:

0x46e8b0: mov rcx, qword ptr [rsi] 
0x46e8b3: mov rdx, qword ptr [rsi + r8 - 8] 
0x46e8b8: mov qword ptr [rdi], rcx 
0x46e8bb: mov qword ptr [rdi + r8 - 8], rdx 
Instruction not supported: 0x46e8c0: vzeroupper

From the coredump file, we can identify that the address 0x46e8bb belongs to the main binary jdhack-rpg. Indeed, this binary is statically compiled, and trust me or not, this code matches a function known as __strncpy_avx2.

Ok, but what can we do when we're faced with an unsupported instruction? Well, there are many options:

The first one, we have an option to just skip the unsupported instruction, just initialize the Config with the optional argument skip_unsupported_instruction = True. Sometimes, the unsupported instruction just does not have any side effect on the symbolic content being tracked.
You could put a hook on this instruction, or even on __strncpy_avx2, using the routine provided with tritondse. Indeed, emulating the whole internals of the libc copy functions is not really useful, our routine will just propagate the symbolic state.
You could also implement an handler for the faulty instruction in Triton, and submit your PR, we'll be grateful :D
Or maybe... We could also just inspect the current CPU state and see if we need to go further!

This last option can be easily done with an interactive Python session, just restart the previous Python session with python -i ./solve-tritondse.py:

0x40239c: mov rdx, rbx 
0x40239f: mov rsi, r12 
0x4023a2: mov rdi, rax 
0x4023a5: add r13d, 1 
0x4023a9: call 0x401078 
0x401078: jmp qword ptr [rip + 0xfdfe2] 
...
0x46e8b0: mov rcx, qword ptr [rsi] 
0x46e8b3: mov rdx, qword ptr [rsi + r8 - 8] 
0x46e8b8: mov qword ptr [rdi], rcx 
0x46e8bb: mov qword ptr [rdi + r8 - 8], rdx 
Instruction not supported: 0x46e8c0: vzeroupper

>>> executor.pstate.memory.read_string(executor.pstate.read_register('rsi'))
' is Power.'
>>> # looks like we are in the middle of a string, a few instructions above
>>> # we can see rsi is initialized with r12
>>> executor.pstate.memory.read_string(executor.pstate.read_register('r12'))
'Language is Knowledge, Knowledge is Power.'

This is the string shown with an invalid flag, a string that we did not find in the binary... Because it is embedded in the VM. So basically, we have already reached the flag validation! It took 36 sec on my laptop.

Let's take a look at the symbolic state and the identified constraints.

>>> constraints = [c.getBranchConstraints()[0] for c in executor.pstate.get_path_constraints()]
>>> for c in constraints:print(c)
... 
{'isTaken': True, 'srcAddr': 140737353987905, 'dstAddr': 140737353987869, 'constraint': (= ref!48 (_ bv0 1))}
...
{'isTaken': True, 'srcAddr': 140737353987415, 'dstAddr': 140737353987424, 'constraint': (= ref!1755393 (_ bv1 1))}
{'isTaken': False, 'srcAddr': 4646171, 'dstAddr': 4646214, 'constraint': (= ref!2518109 (_ bv0 1))}
{'isTaken': True, 'srcAddr': 4646687, 'dstAddr': 4646589, 'constraint': (= ref!2518188 (_ bv0 1))}
{'isTaken': False, 'srcAddr': 4646595, 'dstAddr': 4646217, 'constraint': (= ref!2518196 (_ bv1 1))}

The first constraint in this snippet is at the address 0x7ffff7fdb741 (= 140737353987905). Do you remember when looking at the first lines of emulated code and we said it was checking if the first char was null? We can verify this by negating the constraint and asking to solve it:

>>> # we need to retrieve the astctx to add the not node
>>> astctx = executor.pstate.tt_ctx.getAstContext()
>>> executor.pstate.tt_ctx.getModel(astctx.lnot(constraints[0]['constraint']))
{0: flag[0]:8 = 0x0}
>>> executor.pstate.tt_ctx.getModel(astctx.lnot(constraints[1]['constraint']))
{0: flag[1]:8 = 0x0}
...

Ok cool, so can we solve everything so easily?

>>> for c in constraints:
...     c=c['constraint']
        m = executor.pstate.tt_ctx.getModel(astctx.lnot(c))
        if (len(m)>0)
...       print(f"{c} => {m}")
...     
(ref_48 == 0x0) => {0: flag[0]:8 = 0x0}
(ref_512 == 0x0) => {1: flag[1]:8 = 0x0}
(ref_976 == 0x0) => {2: flag[2]:8 = 0x0}
(ref_1440 == 0x0) => {3: flag[3]:8 = 0x0}
(ref_1904 == 0x0) => {4: flag[4]:8 = 0x0}
(ref_2368 == 0x0) => {5: flag[5]:8 = 0x0}
(ref_2832 == 0x0) => {6: flag[6]:8 = 0x0}
(ref_3296 == 0x0) => {7: flag[7]:8 = 0x0}
(ref_3760 == 0x0) => {8: flag[8]:8 = 0x0}
(ref_4224 == 0x0) => {9: flag[9]:8 = 0x0}
(ref_4688 == 0x0) => {10: flag[10]:8 = 0x0}
(ref_5152 == 0x0) => {11: flag[11]:8 = 0x0}
(ref_5616 == 0x0) => {12: flag[12]:8 = 0x0}
(ref_6080 == 0x0) => {13: flag[13]:8 = 0x0}
(ref_6544 == 0x0) => {14: flag[14]:8 = 0x0}
(ref_7008 == 0x0) => {15: flag[15]:8 = 0x0}
(ref_12003 == 0x1) => {9: flag[9]:8 = 0x94, 10: flag[10]:8 = 0xe0, 11: flag[11]:8 = 0x57, 12: flag[12]:8 = 0x47, 13: flag[13]:8 = 0xdd, 14: flag[14]:8 = 0x5b, 15: flag[15]:8 = 0x3d, 8: flag[8]:8 = 0x56}       
(ref_28816 == 0x1) => {0: flag[0]:8 = 0xa}
(ref_86534 == 0x1) => {1: flag[1]:8 = 0xa}
(ref_144252 == 0x1) => {2: flag[2]:8 = 0xa}
(ref_201970 == 0x1) => {3: flag[3]:8 = 0xa}
(ref_259688 == 0x1) => {4: flag[4]:8 = 0xa}
(ref_317406 == 0x1) => {5: flag[5]:8 = 0xa}
(ref_375117 == 0x1) => {6: flag[6]:8 = 0xa}
(ref_432828 == 0x1) => {7: flag[7]:8 = 0xa}
(ref_490539 == 0x1) => {8: flag[8]:8 = 0xa}
(ref_548250 == 0x1) => {9: flag[9]:8 = 0xa}
(ref_605961 == 0x1) => {10: flag[10]:8 = 0xa}
(ref_663672 == 0x1) => {11: flag[11]:8 = 0xa}
(ref_721383 == 0x1) => {12: flag[12]:8 = 0xa}
(ref_779094 == 0x1) => {13: flag[13]:8 = 0xa}
(ref_836806 == 0x1) => {14: flag[14]:8 = 0xa}
(ref_894518 == 0x1) => {15: flag[15]:8 = 0xa}
(ref_1028625 == 0x1) => {0: flag[0]:8 = 0x0}
(ref_1039462 == 0x1) => {0: flag[0]:8 = 0x63}
(ref_1076354 == 0x1) => {1: flag[1]:8 = 0x0}
(ref_1087188 == 0x1) => {1: flag[1]:8 = 0x30}
(ref_1124080 == 0x1) => {2: flag[2]:8 = 0x0}
(ref_1134917 == 0x1) => {2: flag[2]:8 = 0x4e}
(ref_1171809 == 0x1) => {3: flag[3]:8 = 0x0}
(ref_1182647 == 0x1) => {3: flag[3]:8 = 0x67}
(ref_1219539 == 0x1) => {4: flag[4]:8 = 0x0}
(ref_1230375 == 0x1) => {4: flag[4]:8 = 0x72}
(ref_1267267 == 0x1) => {5: flag[5]:8 = 0x0}
(ref_1278105 == 0x1) => {5: flag[5]:8 = 0x61}
(ref_1314997 == 0x1) => {6: flag[6]:8 = 0x0}
(ref_1325834 == 0x1) => {6: flag[6]:8 = 0x54}
(ref_1362726 == 0x1) => {7: flag[7]:8 = 0x0}
(ref_1373566 == 0x1) => {7: flag[7]:8 = 0x53}
(ref_1410458 == 0x1) => {8: flag[8]:8 = 0x0}
(ref_1421295 == 0x1) => {8: flag[8]:8 = 0x5f}
(ref_1458187 == 0x1) => {9: flag[9]:8 = 0x0}
(ref_1469023 == 0x1) => {9: flag[9]:8 = 0x4a}
(ref_1505915 == 0x1) => {10: flag[10]:8 = 0x0}
(ref_1516751 == 0x1) => {10: flag[10]:8 = 0x30}
(ref_1553643 == 0x1) => {11: flag[11]:8 = 0x0}
(ref_1564477 == 0x1) => {11: flag[11]:8 = 0x76}
(ref_1601369 == 0x1) => {12: flag[12]:8 = 0x0}
(ref_1612206 == 0x1) => {12: flag[12]:8 = 0x40}
(ref_1649098 == 0x1) => {13: flag[13]:8 = 0x0}
(ref_1659935 == 0x1) => {13: flag[13]:8 = 0x4b}
(ref_1696827 == 0x1) => {14: flag[14]:8 = 0x0}
(ref_1707661 == 0x1) => {14: flag[14]:8 = 0x69}
(ref_1744553 == 0x1) => {15: flag[15]:8 = 0x0}
(ref_1755393 == 0x1) => {15: flag[15]:8 = 0x4e}

Basically, the solver gives us a solution almost immediately for most of the constraints. And obviously, as we have negated all of them, the result gives us some unclear results, for example flag[0]==0xA and flag[0]==0x0 and flag[0]==0x63. This is a logical result, since the binary has various sanity check on each bytes of the flag, and we can easily filter the result:

>>> results = [executor.pstate.tt_ctx.getModel(astctx.lnot(c['constraint'])) for c in constraints]
>>> flag = 16*[0]
>>> for d in results:
      if len(d) == 1:
        pos, value = next(iter(d.items()))
        if value not in (0, 10):
            flag[pos]=value.getValue()
>>> bytes(flag)
b'c0NgraTS_J0v@KiN'

Putting everything together enables us to do some timing:

$ python solve-tritondse.py
Instruction not supported: 0x46e8c0: vzeroupper
Emulation took 0:00:35.339372
Solving ...
flag =b'c0NgraTS_J0v@KiN' found in 0:00:36.755562

The attentive readers could ask themselves about the constraint identified by ref_12003, giving different values for the high 8 bytes value of the flag. If we look at the code from where this constraint comes, the address is 0x7ffff7ab6f76. This is code is in the libc, and more specifically, in __libc_free() (cf https://elixir.bootlin.com/glibc/glibc-2.42/source/malloc/malloc.c#L3531), a sanity check being done by the allocator to prevent double free. The various reallocation involving multiple copies of the flag also have an impact on the symbolized memory buffer. To avoid this, we could have hooked the main allocator functions (mainly malloc and free) thanks to TritonDSE routines... But hey man, we are in a CTF mode, we can just also ignores this constraint :-)

Could we solve it faster ?

Thanks to the previous work, we already know that each byte of the flag is independent from each other. This means this challenge could be a perfect candidate for a bruteforce approach, let's see how we could have done this with QBDI, our Dynamic Binary Instrumentation framework.

Remember: it's a CTF, we want to go fast and the main goal is the flag! The QBDI Python bindings is the perfect candidate for this, let's just start by creating a venv, pip install pyqbdi and start hacking. The classical way to attack this usecase with pyQBDI would be to load the library level_4.so in the Python process with ctypes. But in this particular usecase, if we simply do this:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary("./level_4.so")
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    ctypes.cdll.LoadLibrary("./level_4.so")
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.14/ctypes/__init__.py", line 552, in LoadLibrary
    return self._dlltype(name)
           ~~~~~~~~~~~~~^^^^^^
  File "/usr/lib/python3.14/ctypes/__init__.py", line 433, in __init__
    self._handle = self._load_library(name, mode, handle, winmode)
                   ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.14/ctypes/__init__.py", line 473, in _load_library
    return _dlopen(name, mode)
OSError: ./level_4.so: undefined symbol: choices_dispose

Indeed, the library requires some exported functions from the main binary. This binary is statically linked, so we cannot easily transform it in another library. The easiest way to solve this is to create a minimalistic library exporting the required functions:

$ nm -D level_4.so | grep ' U ' | awk '{print $2}' | grep -v "@" | awk '{print "int " $1 "() {return 0;}"}'
int choices_add() {return 0;}
int choices_dispose() {return 0;}
int create_choices() {return 0;}
int fadein_image() {return 0;}
int fadeout_image() {return 0;}
int get_image() {return 0;}
int window_clear() {return 0;}
int window_frame() {return 0;}
int window_getch() {return 0;}
int window_msg() {return 0;}
int window_prompt() {return 0;}
int window_wait() {return 0;}

We don't care about the prototype of the real functions, we will hook the required calls with QBDI, this library is just here to help the loader! Just compile this with gcc -shared -fPIC -o fake-jdhack-minimalist.so fake-jdhack-minimalist.c, and then, we can load the targeted library using ctypes:

>>> import ctypes
>>> # we use the constructor to be able to give the RTLD_GLOBAL parameters
>>> # allowing other libraries in the same process to resolve those symbols.
>>> ctypes.CDLL("./fake-jdhack-minimalist.so", mode=ctypes.RTLD_GLOBAL)
<CDLL './fake-jdhack-minimalist.so', handle 55c360edf410 at 0x7fb2de8be660>
>>> ctypes.cdll.LoadLibrary("./level_4.so")
<CDLL './level_4.so', handle 55c360ee01e0 at 0x7fb2de9fbb10>

From there, we have everything needed to call the CTF VM with QBDI. Let's look at a small snippet to count the number of instructions with a fixed input.

First, we already saw it, we need to load both libraries. We take this opportunity to retrieve the base address of each module using pyqbdi.GetCurrentProcessMaps() helper function:

import pyqbdi
import ctypes

def load_lib():
    # the lib is importing various function from the main binary,
    # so we just create a fake lib doing nothing except exporting the symbol, to make the loader happy...
    zefakelib = ctypes.CDLL("./fake-jdhack.so", mode=ctypes.RTLD_GLOBAL)

    zelib = ctypes.cdll.LoadLibrary("./level_4.so")

    baseadr_fakelib = min([m.range.start for m in pyqbdi.getCurrentProcessMaps() if m.name=="fake-jdhack.so"])
    baseadr_level = min([m.range.start for m in pyqbdi.getCurrentProcessMaps() if m.name=="level_4.so"])

    return {"fakelib":zefakelib, "level":zelib, "baseadr_fakelib":baseadr_fakelib, "baseadr_level":baseadr_level}

Next, we define two hooks that will be used to hook our fake library (yes, we hook the hook :-)):

we need to intercept the call to window_prompt, which is the function used from the binary API to let the user input the flag. Remember the previous part, this is where we generated the coredump. This hook will simply affect RAX, which holds the address of a memory buffer where we copy the flag value we want to check;
we also hook the call to window_msg to avoid losing time in some printf-like call. So we just do nothing in this hook.

A little QBDI trick here: we can skip the execution of an instruction by returning in a hook pyqbdi.SKIP_INST, however, this is not possible with an instruction that modifies the instruction pointer. In our case, instead of searching for all CALL instructions to the imported function that we want to skip, we'll hook the first instruction of the function, and replace it with a RET. This can be done either by changing RIP to the address of a RET instruction (but if the binary is modified, you will probably need to fix the address), or by manually simulating a RET, by popping the address on top of the stack and modifying the RIP register:

def emul_ret(gpr):
    # simulate a RET 
    ret_adr = pyqbdi.readRword(gpr.rsp)
    gpr.rsp += 8
    gpr.rip = ret_adr
    return pyqbdi.BREAK_TO_VM  # we changed RIP !

def hook_window_prompt(vm, gpr, fpr, data):
    # write the flag in the buffer
    gpr.rax = data["buffer"]
    pyqbdi.writeMemory(gpr.rax, data["flag"])
    return emul_ret(gpr)

def hook_window_msg(vm, gpr, fpr, data):
    # do nothing, just used to avoid a lot of garbage on screen..
    return emul_ret(gpr)

Another hook is required to count each instructions. This time, obviously we want to execute the instruction, so we return pyqbdi.CONTINUE:

def cb_count_instruction(vm, gpr, fpr, data):   
    data["nb_instructions"] += 1 
    return pyqbdi.CONTINUE

And finally, we just have to put everything together: load the libraries, initialize QBDI's VM, declare which modules we want to instrument with addInstrumentedModuleFromAddr, install our hooks with addCodeAddrCB and addCodeCB, and finally call the function sub_C6BE that we want to instrument:

cb_data = load_lib()

# Our target function is at baseaddress+0xC6BE.
func_ptr = cb_data['baseadr_level'] + 0xC6BE

# create a QBDI VM
vm = pyqbdi.VM()

# Allocate a stack for the QBDI VM
state = vm.getGPRState()
stack = pyqbdi.allocateVirtualStack(state, 0x1000000)

print(f"[**] library loaded and qbdi vm initialized \\o/ target function at 0x{func_ptr:x}")

# we want to instrument level_4.so
vm.addInstrumentedModuleFromAddr(func_ptr)

# we put a hook on the hooks ^^ 
# first, we get functions addresses thanks to ctype, and then add the callback before executing the instruction
hooked_window_prompt_adr = ctypes.cast(cb_data['fakelib'].window_prompt, ctypes.c_void_p).value
hooked_window_msg_adr = ctypes.cast(cb_data['fakelib'].window_msg, ctypes.c_void_p).value

vm.addCodeAddrCB(hooked_window_msg_adr, pyqbdi.PREINST, hook_window_msg, cb_data)
vm.addCodeAddrCB(hooked_window_prompt_adr, pyqbdi.PREINST, hook_window_prompt, cb_data)

# obviously we need to instrument the fakelib to be able to put a hook on it
vm.addInstrumentedModuleFromAddr(hooked_window_prompt_adr)

# in our hook on windows_prompt we provide a buffer with the flag content, let's allocate it now
cb_data["buffer"]=pyqbdi.allocateMemory(32) 
cb_data["flag"] = bytearray(b'0123456789ABCDEF\x00')

# and add a hook on each instruction to count instructions
vm.addCodeCB(pyqbdi.PREINST, cb_count_instruction, cb_data)

cb_data["nb_instructions"]=0
vm.call(func_ptr, [])

print(f"[**] {cb_data["nb_instructions"]} instructions executed for the flag {cb_data["flag"]})

And here we go :

$ python test-onerun-qbdi.py
[**] library loaded and qbdi vm initialized \o/ target function at 0x7f70efce26be
[**] 2343272 instructions executed for the flag bytearray(b'0123456789ABCDEF\x00')
$ # we modify the input flag by fixing the valid first char
$ python test-onerun-qbdi.py
[**] library loaded and qbdi vm initialized \o/ target function at 0x7ff9c2db56be
[**] 2343273 instructions executed for the flag bytearray(b'c123456789ABCDEF\x00')

In case of E_CURIOSITY_OVERFLOW, a trick to quickly identify where is the difference is to replace the cb_count_instruction with a hook which will dump each instruction:

def cb_count_instruction(vm, gpr, fpr, data):
    # For debug : print the instruction and his offset inside the executable
    instAnalysis = vm.getInstAnalysis()
    print("0x{:x}: {}".format(instAnalysis.address - data['baseadr_level'], instAnalysis.disassembly))

    #data["nb_instructions"] += 1 
    return pyqbdi.CONTINUE

And then diff both output:

$ python test-onerun-qbdi.py 0123456789ABCDEF > /tmp/test1.log
$ python test-onerun-qbdi.py c123456789ABCDEF > /tmp/test2.log
$ diff -y  /tmp/test1.log /tmp/test2.log | grep "|" -B2 -A2
0xe555: test  al, al                      0xe555: test al, al
0xe557: je  0x7                           0xe557: je    0x7
0xe560: mov eax, 0x0                    | 0xe559: mov   eax, 0x1
                                        > 0xe55e: jmp   0x5
0xe565: mov dword ptr [rbp - 0x1c], eax   0xe565: mov   dword ptr [rbp - 0x1c], eax

Looking in a decompiler, this code is part of a VM handler, which pops the two values on the VM stack, compares them, and pushes back the result of the test on the stack:

__int64 __fastcall sub_E49E(__int64 a1, __int64 a2)
{
  // ...
  v6 = *(_DWORD *)std::vector<int>::back(a1);
  std::vector<int>::pop_back(a1);
  v5 = *(_DWORD *)std::vector<int>::back(a1);
  std::vector<int>::pop_back(a1);
  v4 = (unsigned __int8)std::function<bool ()(int,int)>::operator()(a2, v5, v6) != 0;
  return std::vector<int>::push_back(a1, &v4);
}

That makes sense for a stack based VM, and whatever arithmetic operation is used, it ends with a CMP instruction.

Bruteforcing the flag

The task is quite easy: we count the number of instructions executed with a "full false flag" (using a flag full of bytes impossible to write from the terminal, for example, 0x1F), and then, for each byte of the flag, we try all the values between 0x20 and 0x7F. If the number of instructions executed is not the same as first run, we found a valid char for this byte of the flag. Nothing fancy here, just one important thing: we will spawn a dedicated new process for each run. Indeed, if we try to call twice the VM function in the same process, there is high chance that some code in the C++ runtime will take a different path depending of previous call. Trying to catch the various side effects would bring complexity, the easiest way to be sure of a deterministic run is to start the VM from a fresh process each time.

We chose to use Python Process from multiprocessing, and make them communicate through Pipe. The code is available here.

$ python solve-qbdi.py
[**] First run gives 2343271 instructions
[**] ================== flag[0] ==================
[**] Batch for range between 32 and 48
[**] Batch for range between 48 and 64
[**] Batch for range between 64 and 80
[**] Batch for range between 80 and 96
[**] Batch for range between 96 and 112
[**] Found valid char 'c'
[**] ================== flag[1] ==================
[**] Batch for range between 32 and 48
[**] Batch for range between 48 and 64
[**] Found valid char '0'
[**] ================== flag[2] ==================
...
[**] Batch for range between 64 and 80
[**] Found valid char 'N'
[**] flag=b'c0NgvaTS_E6v@KiN' found in 0:03:24.060316

This test has been done on a modern laptop with 22 cores, which allows for concurrent processes during the bruteforcing. Meanwhile, we are still about 6 times slower than TritonDSE at solving, which was only 36 seconds. But to be completely fair, if we change the configuration of TritonDSE to continue execution when running into an invalid instruction, the emulation time would increased to 1 minutes and 44 seconds. Indeed, we are running the full VM in QBDI, and luckily for TritonDSE, the second part of the VM code, which triggers this invalid instruction, only deals with the content output to say it's a bad flag. So it can be stopped without any side effect on the result.

Anyway, the answer to our initial question, for this usecase, and even with the same "coverage", is that bruteforcing is not faster than identifying constraints and solving them!

An attentive reader will have noticed that we have a different flag that the one provided by TritonDSE. Indeed, bytes 4, 9 and 10 are different. And yet, this flag also validates in the binary. Let's take a few more minutes to look into why.

Bonus: VM disassembly with TritonDSE

In the context of a CTF, we would not need to make any more efforts, obviously, we've already flagged! But simply because this blog post was mainly intended to provide a tutorial for TritonDSE and QBDI, let's imagine that we have spent some time reversing the different handlers. We will now look at how to easily implement a disassembler of this VM.

Remember in the initial reverse part, we identified the function sub_D5FA as the main dispatcher. While looking at this function, we can quickly identify key structures, like the VM stack (a std::vector<int>) or the VM program counter pointer. The various handler are also quite easy to reverse, like for example the one for the opcode 0x01:

.text:000000000000D675 loc_D675:                               ; CODE XREF: sub_D5FA+52↑j
.text:000000000000D675                                         ; DATA XREF: .rodata:jpt_D64C↓o
.text:000000000000D675                 mov     rax, [rbp+var_2B8] ; jumptable 000000000000D64C case 1
.text:000000000000D67C                 mov     rdi, rax
.text:000000000000D67F                 call    sub_DD9C     ; check if stack vector is not empty and if so returns stack.pop_back()
.text:000000000000D684                 jmp     loc_DBAF

After few efforts of reverse, we identified the various opcode:

OPCODE= {
    0:'PUSH ',
    1:'POP',
    2:'LOAD',
    3:'STORE',
    4:'EXCH',
    5:'DUP',
    6:'ADD',
    7:'SUB',
    8:'MUL',
    9:'DIV',
    10:'XOR',
    11:'AND',
    12:'OR',
    13:'EQU',
    14:'NOT',
    15:'LSS',
    16:'GTR',
    17:'LEQ',
    18:'GEQ',
    19:'NEG',
    20:'JMP ',
    21:'JFALSE ',
    22:'JTRUE ',
    23:'RET',
    24:'SYSCALL ',
    25:'HALT'
}

With this dictionary, and few lines of Python to add in the main loop of our previous TritonDSE snippet, we can dump the current VM instruction, as well as the VM state. And we can even say if the value on the stack is symbolized or not:

def trace_inst(se: SymbolicExecutor, pstate: ProcessState, inst: Instruction):
  if inst.getAddress()==BASEADR+0xC752:
    # VM emulation ended !
    se.abort()

  if inst.getAddress()==BASEADR+0x0D623:
    # read VM PC
    ctx_adr = pstate.memory.read_qword(pstate.cpu.rbp-0x2B8)
    vm_pc = pstate.memory.read_qword(ctx_adr+0x30)
    opcode = pstate.cpu.rax
    arg = ''
    if opcode in [0,20,21,22,24]:                  # are there some args to read ?
      # RDI still points to what we want \o/
      arg = f"{pstate.memory.read_dword(pstate.cpu.rdi+4):08x}"

    # A std::vector is represented in memory by a pointer to the first value, and a pointer to the last value
    stack_first = pstate.memory.read_qword(ctx_adr)
    stack_end = pstate.memory.read_qword(ctx_adr+8)

    # we create the output and prefix symbolic values with a '!'
    stack_content = "STACK=["
    while stack_first!=stack_end:
      symbolic_value = pstate.is_memory_symbolic(stack_first,4)
      stack_content += f"{' !' if symbolic_value else ' '}{pstate.memory.read_dword(stack_first):04x}"
      stack_first+=4
    stack_content += " ]"

    print(f"{vm_pc:04x} {OPCODE[opcode]}{arg} | {stack_content}")

To help follow the various char of the flag, let the emulation run with the following seed: b"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F". This will help us to visually track the current char being processed. Running the emulation now shows:

$ python -i solve-tritondse.py
00d5 DUP | STACK=[ 0007 !0010 ]
00d6 PUSH 0000000a | STACK=[ 0007 !0010 !0010 ]    ; 0x10 is the concrete value of flag[0] we pushd in the seed
00d7 SUB | STACK=[ 0007 !0010 !0010 000a ]
00d8 JFALSE 000000eb | STACK=[ 0007 !0010 !0006 ]  ; check if flag[0]==0xA
00d9 POP | STACK=[ 0007 !0010 !0006 ]
...
010c PUSH 00000254 | STACK=[ 000a ]
010d LOAD | STACK=[ 000a 0254 ]
010e LOAD | STACK=[ 000a 00b9 ]
010f JFALSE 0000024d | STACK=[ 000a !0010 ]
0110 PUSH 00000054 | STACK=[ 000a !0010 ]
0111 XOR | STACK=[ 000a !0010 0054 ]
0112 PUSH 00000037 | STACK=[ 000a !0044 ]
0113 EQU | STACK=[ 000a !0044 0037 ]               ; flag[0] ^ 0x54 == 0x37 ?
...
0121 LOAD | STACK=[ 000a 0254 ]
0122 LOAD | STACK=[ 000a 00ba ]
0123 JFALSE 0000024d | STACK=[ 000a !0011 ]
0124 PUSH 00000059 | STACK=[ 000a !0011 ]
0125 MUL | STACK=[ 000a !0011 0059 ]
0126 PUSH 000010b0 | STACK=[ 000a !05e9 ]
0127 EQU | STACK=[ 000a !05e9 10b0 ]               ; flag[1] * 0x59 == 0x5E9 ?
...
01d5 LOAD | STACK=[ 000a 0254 ]
01d6 LOAD | STACK=[ 000a 00c3 ]
01d7 JFALSE 0000024d | STACK=[ 000a !001a ]
01d8 PUSH 00000030 | STACK=[ 000a !001a ]
01d9 DIV | STACK=[ 000a !001a 0030 ]
01da PUSH 00000001 | STACK=[ 000a !0000 ]
01db EQU | STACK=[ 000a !0000 0001 ]               ; flag[10] / 0x30 == 1 ?
...

The various constraints on the flag are now easy to see. And the last constraint shown for the tenth byte explains why there is more than one valid flag, the division by 0x30 being satisfied by more than one value.

Conclusion

In this blogpost, we illustrated two different reverse engineering approaches on an obfuscated binary, using QBDI and TritonDSE, two open source tools developed and maintained by my amazing team.

In this specific usecase, using TritonDSE and symbolic execution allowed us to quickly extract meaningful constraints, with minimal effort, allowing us to solve the challenge in under a minute. On the other hand, using QBDI demonstrated a more pragmatic, black-box approach. Even without fully understanding the VM internals, we exploited an observable side effect to recover the flag through brute force. While slower, this method remains robust and often easier to implement under time pressure, especially when dealing with VM.

Beyond solving a challenge, the goal of this post was to provide concrete and reproducible examples of how our tools can be used in real-world reverse engineering scenarios. Whether it is for symbolic reasoning, emulation from snapshots, or lightweight instrumentation, both TritonDSE and QBDI offer powerful capabilities that are worth integrating into your toolbox. Hopefully, this walk-through gives you a solid starting point to experiment with these techniques on your own targets.

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

Table of contents

QBDI vs TritonDSE against a VM: who will be the fastest?