How to perform snapshot-based coverage-guided fuzzing on Windows kernel components using Rewind, a tool we have just published on Github.
I always enjoyed doing kernel vulnerability research specially on Windows kernel. The process invariably involves a mix of static and dynamic analysis that can quickly become tedious. In fact, the common cycle debug / crash / reboot / reset all breakpoints is slow and painful. When you want to do some fuzzing, the setup is even more complicated, as it often requires you to setup one or several virtual machines plus a kernel debugger and craft some ghetto scripts to handle crash detection…
Using snapshots with virtual machines helps but it’s slow. Even if the restoration of the snapshot only take 1 second, it means that you will only be able to execute at most one test case per second. These numbers are not good if you plan to do some fuzzing.
During 2018, Microsoft introduced a new set of API named Windows Hypervisor Platform (WHVP). These API allow to setup a partition (what is called a VM in Hyper-V lingua) with some virtual processors and to have a control on the VM exits occurring in the virtual machine. It’s almost like having your own VM-exit handler in userland. Quite handy to do useful things, for example Simpleator (user-mode emulator for Windows applications) or applepie (hypervisor implementation for Bochs).
So I started to play with WHVP and made a first PoC allowing me to execute some shellcode in a Hyper-V partition. It was written in Python and quite slow. This first PoC evolved quite quickly to some kind of snapshot-based tracer. I wanted to have something to bootstrap the virtual CPU and quite easy to setup. So since I was already using a kernel debugger to play with my target, I decided to use kernel dumps made with WinDbg as my snapshot. With that I just needed to setup a partition with a virtual CPU. The virtual CPU context is set with the context taken from the dump. Whenever the virtual CPU needs a physical page I use the ones from the dump.
With this I was able to fork the state of the dump into a partition and then resume execution. It allowed me to easily trace the execution of my target function. By modifying the arguments and reverting the memory state of the partition it was also really easy to fuzz the target.
Since these small VMs are tailored for a specific usage of a target function, they are quite small and the snapshot restoration is way faster than using the entire memory of the initial VM. This way we can achieve several thousand executions per second.
The tool implements two possibilities to obtain the coverage. The first one leverages the classical TF (Trap Flag) to have INT1 interruptions on every instruction. It requires to modify the target and it’s slow. I would have preferred to use MONITOR trap flag, but WHVP doesn’t offer this possibility.
In order to have proper performances (required for fuzzing), I decided to reduce the precision of the coverage and add a mode where you only know when an instruction is executed for the first time.
To do that I patch the executable pages fetched from the snapshot with 0xcc bytes that are decoded as the INT3 instruction. When the CPU will execute these patched instructions the hypervisor will trap the exception and rewrite the instructions with the original code.
It’s like having a unique software breakpoint set on every instruction. It works 95% of the time, but on particular pieces of code (ones with jump tables for example) it will fail because data will be replaced.
To overcome this, one option would be to disassemble the code before mapping it and only patch what is needed (maybe next time).
During my experiment I encountered several limitations when using WHVP. VM exits are slow, like really slow. VirtualBox source code has some interesting comments (look for them at the end of the file). I guess it's because a lot of structures are copied from the hypervisor to the kernel and finally to the userland to be able to handle these VM exits.
So to have proper performance you really need to limit VM exits. It is really limiting if you want to use Hyper-V as a tracing hypervisor (since it requires a lot of VM exits).
During the same time I started to use Bochs (specially the instrumentation part) to check if the traces obtained by the tool were correct. Bochs was some kind of oracle to see if I had divergent traces.
Bochs is faster than WHVP when doing full traces and you also have the benefits of intercepting memory accesses.
I decided to add Bochs as another backend. whvp was not a proper name for the tool anymore and I settled on rewind.
Rewind is a PoC for a snapshot-based coverage-guided fuzzer targeting Windows kernel components. It is available on Github: https://github.com/quarkslab/rewind/.
The idea is to start from a snapshot of a live running system. This snapshot is composed of the physical memory pages along with the state of the CPU.
This state is used to setup the initial state of a virtual CPU. By leveraging on-demand paging, only the pages needed for the execution of the target function are read from the snapshot.
As a result we obtain a dedicated virtual machine with only the physical memory pages useful for the execution of the target function.
As of now 2 backends are supported:
- WHVP backend leverages WHVP (Windows Hypervisor Platform) API to provide access to a Hyper-V partition. See https://docs.microsoft.com/en-us/virtualization/api/hypervisor-platform/hypervisor-platform for more details.
- bochs backend leverages the Bochs emulator.
A KVM backend is in development and should be available soon.
Rewind provides two main features:
- the ability to trace an arbitrary function;
- the ability to fuzz an arbitrary function.
It also provides a basic TUI to report useful information regarding the fuzzing.
It has been tested on Windows and Linux (only Bochs backend for Linux for now).
rewind was designed around my own workflow when I’m conducting security assessments for kernel drivers on the Windows platform.
The first step is to install the target software inside a virtual machine. Since I'm using a mix of static and dynamic analysis I will also setup a kernel debugger.
After opening some random drivers in IDA, I’ll quickly begin to target some functions. To do that I usually put some breakpoints with windbg and combined with ret-sync I can start to play.
That’s where rewind is useful. Instead of editing random buffers in memory and single-stepping and annotating the IDB to have a rough idea of what’s is going on, I’ll take a snapshot with windbg and use rewind instead.
It will ease the process a lot. Having a snapshot offers a lot of advantages. Everything is deterministic. You can replay ad nauseum a function call. You can launch a fuzzer if the target function looks interesting. You can even close the VM since it’s not needed anymore.
Real world example
I decided to check if rewind was able to find real world vulnerabilities. The main use case of rewind is to be able to trace or fuzz a function encountered during a kernel driver security assessment. So I need a vulnerability in a driver (in an ioctl would be perfect).
In November 2019, Google P0 released some kind of 0-day in cng.sys (https://bugs.chromium.org/p/project-zero/issues/detail?id=2104):
The Windows Kernel Cryptography Driver (cng.sys) exposes a \\Device\\CNG device to user-mode programs and supports a variety of IOCTLs with non-trivial input structures. It constitutes a locally accessible attack surface that can be exploited for privilege escalation (such as sandbox escape). We have identified a vulnerability in the processing of IOCTL 0x390400, reachable through the following series of calls: 1. cng!CngDispatch 2. cng!CngDeviceControl 3. cng!ConfigIoHandler_Safeguarded 4. cng!ConfigFunctionIoHandler 5. cng!_ConfigurationFunctionIoHandler 6. cng!BCryptSetContextFunctionProperty 7. cng!CfgAdtReportFunctionPropertyOperation 8. cng!CfgAdtpFormatPropertyBlock The bug resides in the cng!CfgAdtpFormatPropertyBlock function and is caused by a 16-bit integer truncation issue.
If you are interested to see how you can use rewind to find this vulnerability, I encourage you to follow the tutorial provided in the repository (in the examples directory).
Many things remain to be done.
On a short term I plan to add kvm as a backend (preliminary tests show that it is faster than WHVP). I also have some ideas to improve the performance of the fuzzer loop. I will also add support for other types of snapshots (raw memory dumps for example).
- This software is in a very early stage of development and an ongoing experiment.
- Sometimes the tracer is unable to trace the target function (most common issue is invalid virtual CPU state).
- When using hit coverage mode, the tracer will misbehave on some functions (it is the case with some switch tables). The reason is that each byte is replaced by software breakpoints (including data if they are present in an executable page). A better way to do that would be to obtain the list of all the basic blocks from a disassembler for example.
- The target function will be executed with just one virtual processor, you have no support for hardware so it’s probable something will be wrong if you trace hardware related functions.
- This tool is best used for targetting specific functions.
- To have best performances, minimize VM exits and modified pages because they can be really costly and will increase the time needed to execute the function.
- Don’t use Hyper-V to do snapshots. Windows Hyper-V VMs are “enlightened”, meaning that they are using paravirtualization, which is currently not handled
- Some symbols are not resolved properly.
This tool is currently developed and sponsored by Quarkslab under the Apache 2.0 license.
Hail to @yrp604, @0vercl0k, Alexandre Gazet for their help, feedback and thoughts. Thanks also to all my colleagues at Quarkslab!