Microsoft is currently working on Xtended Flow Guard (XFG), an evolved version of Control Flow Guard (CFG), their own control flow integrity implementation. XFG works by restricting indirect control flow transfers based on type-based hashes of function prototypes. This blog post is a deep dive into how the MSVC compiler generates those XFG function prototype hashes.
Introduction
In 2014, Microsoft introduced a Control Flow Integrity (CFI) solution called Control Flow Guard (CFG). CFG has been extensively studied in the past. Over time, a number of ways to bypass CFG were devised; some of these bypasses relied on implementation issues (such as the integration with JIT compilers, or the availability of sensitive APIs that were subject to abuse), but as such they were eventually addressed. But on the contrary, one design issue remained alive: CFG didn't offer any granularity over the valid call targets. Any protected indirect call was allowed to call any valid call target. In large binaries, valid call targets could easily be in the thousands, giving attackers plenty of flexibility to bypass CFG by chaining valid C++ virtual functions (see for example the exploitation technique known as Counterfeit Object-oriented Programming (COOP)).
Fast forward a few years. Microsoft has been working on an improved version of CFG, called Xtended Flow Guard (XFG). XFG offers a finer-grained CFI, by restricting indirect calls/jumps through type signature checks. The key concept behind XFG is that a type signature-based hash is assigned at compile time to those functions which can be the destination of an indirect call/jump. Then, at XFG-instrumented indirect call sites, a hash check is performed: only functions with the expected signature hash are allowed.
Some weeks ago, researcher Connor McGarr published a blog post named Exploit Development: Between a Rock and a (Xtended Flow) Guard Place: Examining XFG explaining how XFG works, as well as its potential weaknesses. This sparked my curiosity, so I decided to fire up IDA Pro and Windbg to understand how XFG hashes are generated.
As of this writing, XFG is present in Windows 10 Insider Preview builds, under the Dev Channel. In order to compile programs with XFG support, you need Visual Studio 2019 Preview.
The analysis in this blog post is based on the following versions of the binaries from Visual Studio 2019 Preview, version 16.8.0 Preview 2.1:
c1.dll version 19.28.29213.0
c2.dll version 19.28.29213.0
This blog post focuses on how XFG hashes are generated for C source code. Although the hashing algorithm for C++ code looks similar at first glance, we haven't looked into its specifics. Since this is a rather long article, the content is divided into several sections: first, we start with a quick primer on XFG hashes. Then, we analyze how functions are hashed, followed by a detailed view of how different C types are hashed. Finally, we inspect some final transformations that are applied to the computed hashes, and we conclude with a hands-on hash calculation exercise.
A short primer on XFG hashes
Let's start with a very simple C program defining a function pointer type named FPTR ([1]), which declares a function taking two float arguments and returning another float. Function main declares a function pointer variable named fptr, of type FPTR, which is set to the address of function foo ([2]), whose prototype matches the FPTR type. Finally, at [3], the function to which fptr points is called, passing values 1.00001 and 2.00002 as parameters.
#include <stdio.h>
[1] typedef float (* FPTR)(float, float);
float foo(float val1, float val2){
printf("I received float values %f and %f\n", val1, val2);
return (val2 - val1);
}
int main(int argc, char **argv){
[2] FPTR fptr = foo;
printf("Calling function pointer...\n");
[3] fptr(1.00001, 2.00002);
return 0;
}
We compile the source code above from the x64 Native Tools Command Prompt for VS 2019 Preview with the following command line. Notice that we are using the /guard:xfg flag to enable XFG.
> cl /Zi /guard:xfg example1.c
The disassembly of the resulting main function is shown below:
main ; int __cdecl main(int argc, const char **argv, const char **envp)
main
main var_18 = qword ptr -18h
main var_10 = qword ptr -10h
main arg_0 = dword ptr 8
main arg_8 = qword ptr 10h
main
main mov [rsp+arg_8], rdx
main+5 mov [rsp+arg_0], ecx
main+9 sub rsp, 38h
main+D lea rax, foo
main+14 mov [rsp+38h+var_18], rax
main+19 lea rcx, aCallingFunctio ; "Calling function pointer...\n"
main+20 call printf
main+25 mov rax, [rsp+38h+var_18]
main+2A mov [rsp+38h+var_10], rax
main+2F mov r10, 99743F3270D52870h
main+39 movss xmm1, cs:__real@40000054
main+41 movss xmm0, cs:__real@3f800054
main+49 mov rax, [rsp+38h+var_10]
main+4E call cs:__guard_xfg_dispatch_icall_fptr
main+54 xor eax, eax
main+56 add rsp, 38h
main+5A retn
main+5A main endp
We can see at main+0x2F that the R10 register is set to the expected type-based hash (0x99743F3270D52870) for the function pointer call that follows at main+0x4E. The function to be called through the function pointer is foo, and we can verify that its prototype hash (given by the 8 bytes preceding the beginning of the function) matches the expected one, meaning that function foo is a valid target for the indirect call at main+0x4E. Well, to be precise the prototype hash located 8 bytes before the foo function (0x99743F3270D52871) matches the expected hash we have seen in the R10 register (0x99743F3270D52870) except for the bit 0:
.text:0000000140001008 dq 99743F3270D52871h
foo
foo ; =============== S U B R O U T I N E ================================
foo ; float __fastcall foo(float val1, float val2)
foo foo proc near ; DATA XREF: main+D
foo
foo arg_0 = dword ptr 8
foo arg_8 = dword ptr 10h
foo
foo movss [rsp+arg_8], xmm1
foo+6 movss [rsp+arg_0], xmm0
foo+C sub rsp, 28h
foo+10 cvtss2sd xmm0, [rsp+28h+arg_8]
foo+16 cvtss2sd xmm1, [rsp+28h+arg_0]
foo+1C movaps xmm2, xmm0
foo+1F movq r8, xmm2
foo+24 movq rdx, xmm1
foo+29 lea rcx, _Format ; "I received float values %f and %f\n"
foo+30 call printf
foo+35 movss xmm0, [rsp+28h+arg_8]
foo+3B subss xmm0, [rsp+28h+arg_0]
foo+41 add rsp, 28h
foo+45 retn
foo+45 foo endp
But don't worry about this discrepancy, because at the very beginning of the XFG dispatch function (ntdll!LdrpDispatchUserCallTargetXFG) the bit 0 of R10 is set, resulting in the difference on bit 0 between the expected hash and the function hash not being meaningful:
LdrpDispatchUserCallTargetXFG LdrpDispatchUserCallTargetXFG proc near
LdrpDispatchUserCallTargetXFG ; __unwind { // LdrpICallHandler
LdrpDispatchUserCallTargetXFG or r10, 1
LdrpDispatchUserCallTargetXFG+4 test al, 0Fh
LdrpDispatchUserCallTargetXFG+6 jnz short loc_180094337
LdrpDispatchUserCallTargetXFG+8 test ax, 0FFFh
LdrpDispatchUserCallTargetXFG+C jz short loc_180094337
LdrpDispatchUserCallTargetXFG+E cmp r10, [rax-8]
LdrpDispatchUserCallTargetXFG+12 jnz short loc_180094337
LdrpDispatchUserCallTargetXFG+14 jmp rax
Hashing function types
The MSVC compiler is composed of two stages: a front end and a back end. The front end is language-specific: it reads in source code, lexes, parses, does semantic analysis and emits an IL (intermediate language). The back end is specific to the target architecture: it reads the IL generated by the front end, it performs optimizations and generates code for a given architecture.
The generation of the function prototype hash is left to the language front end. This means that when compiling C code, the C front end (c1.dll) is in charge of generating the prototype hash, while when compiling C++ code, the C++ front end (c1xx.dll) is charged with this task.
Once the prototype hash has been produced by the corresponding language front end, some final transformations are performed by the compiler back end (the x64 back end in our case, c2.dll). In the following sections we'll detail every step of the creation of the prototype hashes while compiling C code.
When compiling C source code with the /guard:xfg flag, the compiler front end calls the c1!XFGHelper__ComputeHash_1 function in order to calculate the prototype hash of a function being processed.
The c1!XFGHelper__ComputeHash_1 function creates an object of type XFGHelper::XFGHasher, which is in charge of collecting type information for the function being processed, and producing the prototype hash, based on the collected type information. The XFGHelper::XFGHasher uses an instance of std::vector to store all the type information that will be hashed, and it offers a number of methods that are called throughout the process of building the hash:
XFGHelper::XFGHasher::add_function_type()
XFGHelper::XFGHasher::add_type()
XFGHelper::XFGHasher::get_hash()
XFGHelper::XFGTypeHasher::compute_hash()
XFGHelper::XFGTypeHasher::hash_indirection()
XFGHelper::XFGTypeHasher::hash_tag()
XFGHelper::XFGTypeHasher::hash_primitive()
After initializing an instance of XFGHelper::XFGHasher, the XFGHelper__ComputeHash_1 function calls XFGHelper::XFGHasher::add_function_type(), passing as parameters the instance of XFGHelper::XFGHasher and a Type_t object containing the type information about the function being hashed.
XFGHelper__ComputeHash_1 XFGHelper__ComputeHash_1 proc near
XFGHelper__ComputeHash_1
XFGHelper__ComputeHash_1 arg_0 = qword ptr 8
XFGHelper__ComputeHash_1 arg_8 = qword ptr 10h
XFGHelper__ComputeHash_1 arg_10 = qword ptr 18h
[...]
XFGHelper__ComputeHash_1+79 xorps xmm0, xmm0
XFGHelper__ComputeHash_1+7C movdqu cs:xfg_hasher, xmm0 ; zero inits xfg_hasher
[...]
XFGHelper__ComputeHash_1+B1 mov rdx, rbp ; rdx = Type_t containing function information
XFGHelper__ComputeHash_1+B4 lea rbp, xfg_hasher
XFGHelper__ComputeHash_1+BB mov rcx, rbp
XFGHelper__ComputeHash_1+BE call XFGHelper::XFGHasher::add_function_type(Type_t const *,XFGHelper::VirtualInfoFromDeclspec)
XFGHelper__ComputeHash_1+C3 mov rdx, rsi ; rdx = function->return_type (struct Type_t *)
XFGHelper__ComputeHash_1+C6 mov rcx, rbp ; this
XFGHelper__ComputeHash_1+C9 call XFGHelper::XFGHasher::add_type(Type_t const *) ; (step 5)
Function XFGHelper::XFGHasher::add_function_type will retrieve 4 pieces of information about the function being hashed, and after returning from XFGHelper::XFGHasher::add_function_type one more piece of information is added via a call to XFGHelper::XFGHasher::add_type, as we can see at XFGHelper__ComputeHash_1+C9 in the disassembly listing above. These pieces of information are stored in the std::vector owned by the XFGHelper::XFGHasher instance:
4 bytes indicating the number of parameters of the function;
8 bytes per function parameter, holding the hash of the type of said parameter;
1 byte indicating whether the function is variadic or not (i.e. if takes a variable number of arguments);
4 bytes specifying the calling convention used by the function;
8 bytes holding the hash of the return type of the function.
Component 1: Number of parameters
The XFGHelper::XFGHasher::add_function_type function starts by adding a DWORD to the std::vector indicating the number of parameters of the function. Notice that this number can be influenced by the function accepting a variable number of arguments, or having virtual information from __declspec (I suspect that this may be some reused code from the XFG implementation for C++, and thus it doesn't really apply to C code, although I haven't confirmed it). In short, the number of parameters considered here will be the real number of parameters declared in the function prototype, minus 1 if the function takes a variable number of arguments, minus 1 again if the function has virtual information from __declspec.
XFGHelper::XFGHasher::add_function_type+18 mov rsi, [rdx+10h] ; rsi = function_info->FunctionTypeInfo
XFGHelper::XFGHasher::add_function_type+1C mov rbx, rcx
XFGHelper::XFGHasher::add_function_type+1F mov rcx, rsi ; this
XFGHelper::XFGHasher::add_function_type+22 movzx r14d, r8b
XFGHelper::XFGHasher::add_function_type+26 mov r15, rdx
XFGHelper::XFGHasher::add_function_type+29 call FunctionTypeInfo_t::RealNumberOfParameters(void)
XFGHelper::XFGHasher::add_function_type+2E mov rcx, rsi ; this
XFGHelper::XFGHasher::add_function_type+31 mov r9d, eax ; r9 = real_number_of_params
XFGHelper::XFGHasher::add_function_type+34 call FunctionTypeInfo_t::IsVarArgsFunction(void)
XFGHelper::XFGHasher::add_function_type+39 mov rdx, [rbx+8]
XFGHelper::XFGHasher::add_function_type+3D lea rbp, [r9-1] ; rbp = real_number_of_params - 1
XFGHelper::XFGHasher::add_function_type+41 test al, al ; is variadic function?
XFGHelper::XFGHasher::add_function_type+43 mov rcx, rbx
XFGHelper::XFGHasher::add_function_type+46 cmovz rbp, r9 ; if not variadic, rbp = real_number_of_params
XFGHelper::XFGHasher::add_function_type+4A test r8b, r8b ; does it have virtual info from __declspec?
XFGHelper::XFGHasher::add_function_type+4D lea r9, [rsp+48h+arg_14]
XFGHelper::XFGHasher::add_function_type+52 lea r8, [rsp+48h+arg_10]
XFGHelper::XFGHasher::add_function_type+57 lea eax, [rbp-1] ; number of params = rbp - 1
XFGHelper::XFGHasher::add_function_type+5A cmovz eax, ebp ; if no virtual info from __declspec, number of params = rbp
XFGHelper::XFGHasher::add_function_type+5D mov [rsp+48h+arg_10], eax ; value to add = number of params (dword)
XFGHelper::XFGHasher::add_function_type+5D ; [step 1]
XFGHelper::XFGHasher::add_function_type+61 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Component 2: Type hash of each parameter
Next, XFGHelper::XFGHasher::add_function_type enters a loop in which it computes a hash of the type of each function parameter, adding each type hash (8 bytes) to the std::vector.
There's special handling for a couple of edge cases (type & 0x10f == 0x103, type & 0x103 == 0x101), but for most parameter types it will fall back to loc_180105541. At that location, the Type_t object representing the type of the parameter being processed is cleaned of qualifiers (such as const (0x800) and volatile (0x40)) if needed (call to Type_t::clearModifiersAndQualifiers) and then the 8-byte hash of the parameter type is added to the std::vector, via the call to XFGHelper::XFGHasher::add_type that we can see below at XFGHelper::XFGHasher::add_function_type+CC. If you're wondering how exactly XFGHelper::XFGHasher::add_type computes a hash for a given Type_t, you'll find the details later, under the "Hashing types" section.
Finally, if there are more parameters to hash, it jumps back to the beginning of the loop.
XFGHelper::XFGHasher::add_function_type+6E loc_1801054F6:
XFGHelper::XFGHasher::add_function_type+6E mov rax, [rsi] ; rax = &function_info->params
XFGHelper::XFGHasher::add_function_type+71 mov rcx, [rax+rdi*8] ; rcx = function_info->params[i] (Type_t)
XFGHelper::XFGHasher::add_function_type+75 mov edx, [rcx] ; edx = params[i].type
XFGHelper::XFGHasher::add_function_type+77 mov eax, edx
XFGHelper::XFGHasher::add_function_type+79 and eax, 10Fh
XFGHelper::XFGHasher::add_function_type+7E cmp eax, 103h ; params[i].type & 0x10f == 0x103 ?
XFGHelper::XFGHasher::add_function_type+83 jnz short loc_18010552C
XFGHelper::XFGHasher::add_function_type+85 cmp edx, 8103h ; params[i].type == 0x8103 ?
XFGHelper::XFGHasher::add_function_type+8B jz short loc_18010554E
XFGHelper::XFGHasher::add_function_type+8D mov r8d, [rcx+4]
XFGHelper::XFGHasher::add_function_type+91 lea edx, [rax-1]
XFGHelper::XFGHasher::add_function_type+94 mov rcx, [rcx+8]
XFGHelper::XFGHasher::add_function_type+98 btr r8d, 1Fh
XFGHelper::XFGHasher::add_function_type+9D call Type_t::createType(Type_t const *,uint,mod_t,bool)
XFGHelper::XFGHasher::add_function_type+A2 jmp short loc_18010554B
XFGHelper::XFGHasher::add_function_type+A4 ; --------------------------------------------------------------
XFGHelper::XFGHasher::add_function_type+A4
XFGHelper::XFGHasher::add_function_type+A4 loc_18010552C:
XFGHelper::XFGHasher::add_function_type+A4 and edx, 103h
XFGHelper::XFGHasher::add_function_type+AA cmp edx, 101h ; params[i].type & 0x103 == 0x101 ?
XFGHelper::XFGHasher::add_function_type+B0 jnz short loc_180105541
XFGHelper::XFGHasher::add_function_type+B2 call Type_t::decayFunctionType(void)
XFGHelper::XFGHasher::add_function_type+B7 jmp short loc_18010554B
XFGHelper::XFGHasher::add_function_type+B9 ; --------------------------------------------------------------
XFGHelper::XFGHasher::add_function_type+B9
XFGHelper::XFGHasher::add_function_type+B9 loc_180105541:
XFGHelper::XFGHasher::add_function_type+B9 mov edx, 8C0h ; discards qualifiers 0x800 (const) | 0x80 | 0x40 (volatile)
XFGHelper::XFGHasher::add_function_type+BE call Type_t::clearModifiersAndQualifiers(mod_t)
XFGHelper::XFGHasher::add_function_type+C3
XFGHelper::XFGHasher::add_function_type+C3 loc_18010554B:
XFGHelper::XFGHasher::add_function_type+C3 ; XFGHelper::XFGHasher::add_function_type+B7↑j
XFGHelper::XFGHasher::add_function_type+C3 mov rcx, rax
XFGHelper::XFGHasher::add_function_type+C6
XFGHelper::XFGHasher::add_function_type+C6 loc_18010554E:
XFGHelper::XFGHasher::add_function_type+C6 mov rdx, rcx ; struct Type_t *
XFGHelper::XFGHasher::add_function_type+C9 mov rcx, rbx ; this
XFGHelper::XFGHasher::add_function_type+CC call XFGHelper::XFGHasher::add_type(Type_t const *) ; adds hash of params[i] type
XFGHelper::XFGHasher::add_function_type+CC ; [step 2]
XFGHelper::XFGHasher::add_function_type+D1 inc rdi
XFGHelper::XFGHasher::add_function_type+D4 cmp rdi, rbp ; counter < number_of_params ?
XFGHelper::XFGHasher::add_function_type+D7 jb short loc_1801054F6 ; if so, loop
Component 3: Variadic function
The next step is adding a single byte to the std::vector, indicating whether the function accepts a variable number of arguments or not. In most cases, when the function does not contain virtual information from __declspec, the following code path is taken:
XFGHelper::XFGHasher::add_function_type+D9 mov rcx, rsi ; this = functioninfo
XFGHelper::XFGHasher::add_function_type+DC call FunctionTypeInfo_t::IsVarArgsFunction(void)
XFGHelper::XFGHasher::add_function_type+E1 mov r8b, al ; r8b = is_var_args_function
XFGHelper::XFGHasher::add_function_type+E4 test r14b, r14b ; contains virtual info from __declspec?
XFGHelper::XFGHasher::add_function_type+E7 jz short loc_1801055EB
[...]
XFGHelper::XFGHasher::add_function_type+163 loc_1801055EB:
XFGHelper::XFGHasher::add_function_type+163 mov rdx, [rbx+8]
XFGHelper::XFGHasher::add_function_type+167 lea r9, [rsp+48h+arg_10+1]
XFGHelper::XFGHasher::add_function_type+16C mov byte ptr [rsp+48h+arg_10], r8b ; value to add = is_var_args_function (byte)
XFGHelper::XFGHasher::add_function_type+16C ; [step 3]
XFGHelper::XFGHasher::add_function_type+171 mov rcx, rbx
XFGHelper::XFGHasher::add_function_type+174 lea r8, [rsp+48h+arg_10]
XFGHelper::XFGHasher::add_function_type+179 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Component 4: Calling convention
Finally, XFGHelper::XFGHasher::add_function_type adds a 4-byte value to the std::vector, indicating the calling convention used by the function. There are not a lot of calling conventions on the Intel x64 architecture (unlike its x86 counterpart): the default x64 calling convention passes integer arguments in registers RCX, RDX, R8, and R9, while floating point arguments are passed through XMM0-XMM3. This default calling convention is internally represented by the value 0x201, but since it is masked with & 0x0F before saving it to the std::vector (see disassembly below), you will most likely see a DWORD with value 0x00000001 for the calling convention.
For the record, although the MSVC x64 compiler typically ignores specifiers such as __cdecl and __stdcall, there's at least one way to obtain a value different than 0x201 for the calling convention: the __vectorcall calling convention is internally represented by value 0x208, meaning that after being masked with & 0x0F, a DWORD with value 0x00000008 will be written to the std::vector.
The code in charge of adding the calling convention data to the std::vector is show below.
XFGHelper::XFGHasher::add_function_type+17E mov eax, [r15+4] ; eax = function_info->calling_convention
XFGHelper::XFGHasher::add_function_type+182 lea r9, [rsp+48h+arg_14]
XFGHelper::XFGHasher::add_function_type+187 mov rdx, [rbx+8]
XFGHelper::XFGHasher::add_function_type+18B lea r8, [rsp+48h+arg_10]
XFGHelper::XFGHasher::add_function_type+190 and eax, 0Fh ; eax = calling_convention & 0xF
XFGHelper::XFGHasher::add_function_type+193 mov rcx, rbx
XFGHelper::XFGHasher::add_function_type+196 mov [rsp+48h+arg_10], eax ; value to add = calling_convention & 0xF (size = dword)
XFGHelper::XFGHasher::add_function_type+196 ; [step 4]
XFGHelper::XFGHasher::add_function_type+19A call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Component 5: Hash of return type
The fifth and final component of the data that will be used to obtain the function prototype hash is not retrieved within the XFGHelper::XFGHasher::add_function_type; instead, it is added right after returning from it. As you can see in the code below, it calls XFGHelper::XFGHasher::add_type, which computes an 8-byte hash for the Type_t representing the return type, and adds those 8 bytes of the hash to the std::vector.
XFGHelper__ComputeHash_1+BE call XFGHelper::XFGHasher::add_function_type(Type_t const *,XFGHelper::VirtualInfoFromDeclspec)
XFGHelper__ComputeHash_1+C3 mov rdx, rsi ; rdx = function->return_type (struct Type_t *)
XFGHelper__ComputeHash_1+C6 mov rcx, rbp ; this
XFGHelper__ComputeHash_1+C9 call XFGHelper::XFGHasher::add_type(Type_t const *) ; (step 5)
Final step: hashing the collected prototype data
If the function contains virtual information from __declspec, an additional 8-byte type hash is generated from that information and added to the std::vector. However, I wasn't able to hit this special case during my tests; as stated before, virtual information probably doesn't apply to C code.
Regardless of the presence or absence of virtual information from __declspec, the XFGHelper__ComputeHash_1 function finishes by calling the XFGHelper::XFGHasher::get_hash function:
XFGHelper__ComputeHash_1+CE test rbx, rbx ; contains virtual info from __declspec?
XFGHelper__ComputeHash_1+D1 jz short loc_1801052EF
[...]
XFGHelper__ComputeHash_1+103 loc_1801052EF:
XFGHelper__ComputeHash_1+103 mov rcx, rbp ; this
XFGHelper__ComputeHash_1+106 mov rbx, [rsp+38h+arg_0]
XFGHelper__ComputeHash_1+10B mov rbp, [rsp+38h+arg_8]
XFGHelper__ComputeHash_1+110 mov rsi, [rsp+38h+arg_10]
XFGHelper__ComputeHash_1+115 add rsp, 30h
XFGHelper__ComputeHash_1+119 pop rdi
XFGHelper__ComputeHash_1+11A jmp XFGHelper::XFGHasher::get_hash(void)
XFGHelper__ComputeHash_1+11A XFGHelper__ComputeHash_1 endp
XFGHelper::XFGHasher::get_hash hashes the type data that has been collected in the std::vector. The hashing algorithm of choice is SHA256, and as we can observe below at XFGHelper::XFGHasher::get_hash+5F, it only returns the first 8 bytes of the resulting SHA256 digest:
XFGHelper::XFGHasher::get_hash(void) public: unsigned __int64 XFGHelper::XFGHasher::get_hash(void)const proc near
[...]
XFGHelper::XFGHasher::get_hash(void)+18 mov dl, 3 ; algorithm_ids[3] == CALG_SHA_256
XFGHelper::XFGHasher::get_hash(void)+1A lea rcx, [rsp+58h+hHash] ; phHash
XFGHelper::XFGHasher::get_hash(void)+1F call HashAPIWrapper::HashAPIWrapper(uchar)
XFGHelper::XFGHasher::get_hash(void)+24 nop
XFGHelper::XFGHasher::get_hash(void)+25 mov r8, [rbx+8]
XFGHelper::XFGHasher::get_hash(void)+29 sub r8, [rbx] ; dwDataLen
XFGHelper::XFGHasher::get_hash(void)+2C xor r9d, r9d ; dwFlags
XFGHelper::XFGHasher::get_hash(void)+2F mov rdx, [rbx] ; pbData
XFGHelper::XFGHasher::get_hash(void)+32 mov rcx, [rsp+58h+hHash] ; hHash
XFGHelper::XFGHasher::get_hash(void)+37 call cs:__imp_CryptHashData
XFGHelper::XFGHasher::get_hash(void)+3D test eax, eax
XFGHelper::XFGHasher::get_hash(void)+3F jnz short loc_180105822
[...]
XFGHelper::XFGHasher::get_hash(void)+4A loc_180105822:
XFGHelper::XFGHasher::get_hash(void)+4A mov r8d, 20h ; ' ' ; unsigned int
XFGHelper::XFGHasher::get_hash(void)+50 lea rdx, [rsp+58h+sha256_digest] ; unsigned __int8 *
XFGHelper::XFGHasher::get_hash(void)+55 lea rcx, [rsp+58h+hHash] ; this
XFGHelper::XFGHasher::get_hash(void)+5A call HashAPIWrapper::GetHash(uchar *,ulong)
XFGHelper::XFGHasher::get_hash(void)+5F mov rbx, qword ptr [rsp+58h+sha256_digest] ; *** only returns first 8 bytes of SHA256 hash
XFGHelper::XFGHasher::get_hash(void)+64 mov rcx, [rsp+58h+hHash] ; hHash
XFGHelper::XFGHasher::get_hash(void)+69 call cs:__imp_CryptDestroyHash
XFGHelper::XFGHasher::get_hash(void)+6F test eax, eax
XFGHelper::XFGHasher::get_hash(void)+71 jnz short loc_180105854
[...]
XFGHelper::XFGHasher::get_hash(void)+7C loc_180105854:
XFGHelper::XFGHasher::get_hash(void)+7C mov rax, rbx
XFGHelper::XFGHasher::get_hash(void)+7F mov rcx, [rsp+58h+var_10]
XFGHelper::XFGHasher::get_hash(void)+84 xor rcx, rsp ; StackCookie
XFGHelper::XFGHasher::get_hash(void)+87 call __security_check_cookie
XFGHelper::XFGHasher::get_hash(void)+8C add rsp, 50h
XFGHelper::XFGHasher::get_hash(void)+90 pop rbx
XFGHelper::XFGHasher::get_hash(void)+91 retn
Hashing types
So far we know that a function prototype hash is built based on 5 pieces of information. Three of them are plain values (number of parameters, a boolean value indicating if the function is variadic, and a number representing the calling convention in use), but the other two components are type hashes themselves (type hash for each function parameter, and hash of the return type). In this section we'll see how types (represented internally by the compiler with a Type_t object) are hashed.
Types are hashed within the XFGHelper::XFGHasher::add_type function. It calls XFGHelper__GetHashForType, which returns an 8-byte hash of the type, and then that 8-byte hash is stored in the std::vector via a call to std::vector::_Insert_range().
.text:00000001801056A0 public: void XFGHelper::XFGHasher::add_type(class Type_t const *) proc near
.text:00000001801056A0 arg_0 = qword ptr 8
.text:00000001801056A0 arg_8 = byte ptr 10h
.text:00000001801056A0
.text:00000001801056A0 push rbx
.text:00000001801056A2 sub rsp, 30h
.text:00000001801056A6 mov rbx, rcx
.text:00000001801056A9 mov rcx, rdx ; rcx = Type_t
.text:00000001801056AC call XFGHelper__GetHashForType
.text:00000001801056B1 mov rdx, [rbx+8]
.text:00000001801056B5 lea r9, [rsp+38h+arg_8]
.text:00000001801056BA lea r8, [rsp+38h+arg_0]
.text:00000001801056BF mov [rsp+38h+arg_0], rax ; value to add = hash (qword)
.text:00000001801056C4 mov rcx, rbx
.text:00000001801056C7 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
.text:00000001801056CC add rsp, 30h
.text:00000001801056D0 pop rbx
.text:00000001801056D1 retn
Let's see how XFGHelper__GetHashForType generates an 8-byte hash for a given Type_t. First of all, it checks if the hash for the given type already exists in a cache that it holds, via the call to std:Tree::emplace() that we can observe at XFGHelper__GetHashForType+AF. If that is the case, it simply returns the cached type hash; this way it avoids computing over and over again the hash for types that have already been calculated.
On the other hand, if the type hash is not found in the cache, it proceeds to compute it from scratch by calling XFGHelper::XFGTypeHasher::compute_hash, which builds an std::vector with the type data to be hashed, and finally calls XFGHelper::XFGHasher::get_hash, which as we already know from the previous section, produces a SHA256 digest of the data contained in the std::vector and returns only the first 8 bytes of that digest.
XFGHelper__GetHashForType XFGHelper__GetHashForType proc near
[...]
XFGHelper__GetHashForType+A3 lea r9, [rbp+arg_8]
XFGHelper__GetHashForType+A7 lea r8, [rbp+Type_t]
XFGHelper__GetHashForType+AB lea rdx, [rbp+xfg_type_hasher]
XFGHelper__GetHashForType+AF call std::_Tree<std::_Tmap_traits<Type_t const *,unsigned __int64,std::less<Type_t const *>,std::allocator<std::pair<Type_t const * const,unsigned __int64>>,0>>::_Emplace<Type_t const * &,int>(Type_t const * &,int &&)
XFGHelper__GetHashForType+B4 mov rbx, qword ptr [rbp+xfg_type_hasher]
XFGHelper__GetHashForType+B8 cmp byte ptr [rbp+xfg_type_hasher+8], 0 ; hash for type was found in cache?
XFGHelper__GetHashForType+BC jz short loc_18010544D ; if so, just return the cached hash
XFGHelper__GetHashForType+BE xor edi, edi ; otherwise, compute the hash of the type
XFGHelper__GetHashForType+C0 xorps xmm0, xmm0
XFGHelper__GetHashForType+C3 movdqu [rbp+xfg_type_hasher], xmm0
XFGHelper__GetHashForType+C8 and [rbp+var_10], rdi
XFGHelper__GetHashForType+CC mov [rbp+var_8], 1
XFGHelper__GetHashForType+D0 mov rdx, [rbp+Type_t] ; struct Type_t *
XFGHelper__GetHashForType+D4 lea rcx, [rbp+xfg_type_hasher] ; this
XFGHelper__GetHashForType+D8 call XFGHelper::XFGTypeHasher::compute_hash(Type_t const *)
XFGHelper__GetHashForType+DD nop
XFGHelper__GetHashForType+DE cmp [rbp+var_8], dil
XFGHelper__GetHashForType+E2 jz short loc_180105434
XFGHelper__GetHashForType+E4 lea rcx, [rbp+xfg_type_hasher] ; this
XFGHelper__GetHashForType+E8 call XFGHelper::XFGHasher::get_hash(void)
[...]
These are the pieces of information that XFGHelper::XFGTypeHasher::compute_hash collects about a given type:
1 byte value derived from the type qualifiers (fetched from offset 4 of the Type_t object);
1 byte indicating what kind of type it is (pointer, union/struct/enum, or primitive type);
some type-specific data, depending on which one of the three type groups mentioned in 2) (pointer, union/struct/enum, or primitive type) the type belongs to.
We'll dig into the details of these three pieces of information in the following sub-sections.
Component 1: Type qualifiers
The first piece of information about a type is its qualifiers, which are stored as a DWORD at offset 4 of a Type_t object. In particular, information about the const (0x800) and volatile (0x40) qualifiers are combined into a single byte that is written to the std::vector. The first bit of this new byte indicates if the const qualifier is present, while the second bit indicates if the volatile qualifier is present.
XFGHelper::XFGTypeHasher::compute_hash+1B call Type_t::getFirstNonArrayType(void)
XFGHelper::XFGTypeHasher::compute_hash+20 mov rcx, rdi ; this
XFGHelper::XFGTypeHasher::compute_hash+23 mov r8d, [rax+4] ; r8d = Type_t->qualifiers
XFGHelper::XFGTypeHasher::compute_hash+27 shr r8d, 0Bh
XFGHelper::XFGTypeHasher::compute_hash+2B and r8b, 1
XFGHelper::XFGTypeHasher::compute_hash+2F movzx r9d, r8b ; r9d = (Type_t->qualifiers >> 0xB) & 1 (has_const_qualifier)
XFGHelper::XFGTypeHasher::compute_hash+33 call Type_t::getFirstNonArrayType(void)
XFGHelper::XFGTypeHasher::compute_hash+38 lea r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+3C mov edx, [rax+4] ; edx = Type_t->qualifiers
XFGHelper::XFGTypeHasher::compute_hash+3F mov al, r9b ; al = has_const_qualifier
XFGHelper::XFGTypeHasher::compute_hash+42 or al, 2 ; al = has_const_qualifier | 2
XFGHelper::XFGTypeHasher::compute_hash+44 and dl, 40h ; dl = Type_t->qualifiers & 0x40 (has_volatile_qualifier)
XFGHelper::XFGTypeHasher::compute_hash+47 movzx ecx, al ; qualifiers_info = has_const_qualifier | 2
XFGHelper::XFGTypeHasher::compute_hash+4A mov rdx, [rbx+8]
XFGHelper::XFGTypeHasher::compute_hash+4E cmovz ecx, r9d ; if it doesn't have volatile qualifier, then
XFGHelper::XFGTypeHasher::compute_hash+4E ; qualifiers_info = has_const_qualifier
XFGHelper::XFGTypeHasher::compute_hash+52 lea r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+56 mov [rbp+arg_0], cl ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::compute_hash+59 mov rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+5C call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Component 2: Type group
If the type value stored in Type_t has 0x100 set, then it is a pointer. This is signaled by writing a byte with value 3 to the std::vector.
XFGHelper::XFGTypeHasher::compute_hash+61 test dword ptr [rdi], 100h ; *Type_t & 0x100 == 0 ?
XFGHelper::XFGTypeHasher::compute_hash+67 jz short loc_180105762
XFGHelper::XFGTypeHasher::compute_hash+69 mov rdx, [rbx+8] ; if not, it's a pointer
XFGHelper::XFGTypeHasher::compute_hash+6D lea r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+71 lea r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+75 mov [rbp+arg_0], 3 ; value to insert: POINTER_TYPE (3)
XFGHelper::XFGTypeHasher::compute_hash+79 mov rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+7C call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
If the type is not a pointer, it then checks if it's a union, a struct or an enum, by checking if the type value stored in Type_t & 0x600 is not 0. Note that 0x600 is built upon 0x200 | 0x400, where 0x200 identifies enum types and 0x400 identifies structs and unions. If this is the case, a byte with value 2 is written to the std::vector.
XFGHelper::XFGTypeHasher::compute_hash+8E loc_180105762:
XFGHelper::XFGTypeHasher::compute_hash+8E test dword ptr [rdi], 600h ; *Type_t & (0x400 | 0x200) == 0 ?
XFGHelper::XFGTypeHasher::compute_hash+94 jz short loc_180105790
XFGHelper::XFGTypeHasher::compute_hash+96 mov rdx, [rbx+8] ; if not, it's a union/struct/enum
XFGHelper::XFGTypeHasher::compute_hash+9A lea r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+9E lea r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+A2 mov [rbp+arg_0], 2 ; value to insert: UNION_STRUCT_OR_ENUM_TYPE (2)
XFGHelper::XFGTypeHasher::compute_hash+A6 mov rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+A9 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Finally, if the type isn't a pointer nor a union/struct/enum, the default case is taken. If the type is generic, then nothing is written to the std::vector (but this is an edge case, affecting only those types with value 0x1000 set, and the type identified with value 0x8103). Otherwise, for the vast majority of primitive types, a byte with value 1 is added to the std::vector.
XFGHelper::XFGTypeHasher::compute_hash+BC loc_180105790:
XFGHelper::XFGTypeHasher::compute_hash+BC mov rcx, rdi ; this
XFGHelper::XFGTypeHasher::compute_hash+BF call Type_t::isGeneric(void)
XFGHelper::XFGTypeHasher::compute_hash+C4 test al, al
XFGHelper::XFGTypeHasher::compute_hash+C6 jz short loc_1801057A2
XFGHelper::XFGTypeHasher::compute_hash+C8 mov byte ptr [rbx+18h], 0
XFGHelper::XFGTypeHasher::compute_hash+CC jmp short epilog
XFGHelper::XFGTypeHasher::compute_hash+CE loc_1801057A2:
XFGHelper::XFGTypeHasher::compute_hash+CE mov rdx, [rbx+8]
XFGHelper::XFGTypeHasher::compute_hash+D2 lea r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+D6 lea r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+DA mov [rbp+arg_0], 1 ; value to insert: PRIMITIVE_TYPE (1)
XFGHelper::XFGTypeHasher::compute_hash+DE mov rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+E1 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Component 3: Type-specific data
Hashing of pointer types
For pointer types, after writing a byte with value 3 to the std::vector, the XFGHelper::XFGTypeHasher::hash_indirection function is called. Have in mind that the definition of pointer here is a bit broader, since it includes all those Type_t objects whose values have 0x100 set. Besides regular C pointers, that includes a kind of internal function object (referenced by function pointers), and arrays.
XFGHelper::XFGTypeHasher::compute_hash+81 mov rdx, rdi ; struct Type_t *
XFGHelper::XFGTypeHasher::compute_hash+84 mov rcx, rbx ; this
XFGHelper::XFGTypeHasher::compute_hash+87 call XFGHelper::XFGTypeHasher::hash_indirection
XFGHelper::XFGTypeHasher::compute_hash+8C jmp short epilog
As its name implies, function XFGHelper::XFGTypeHasher::hash_indirection adds the hash of the type referenced by a pointer to the std::vector. Its behavior varies depending on the type of pointer it's dealing with:
If it's either a function pointer (Type_t value of 0x106) or a "general" pointer with Type_t value 0x102 (used for pointers of most types, except for function pointers), it adds the hash of the Type_t referenced by the pointer by calling XFGHelper::XFGHasher::add_type, plus a byte with value 2. In the case of function pointers, the Type_t referenced by the pointer is a kind of internal function object with Type_t value of 0x101, which means that it's also handled within XFGHelper::XFGTypeHasher::hash_indirection.
XFGHelper::XFGTypeHasher::hash_indirection+15 mov ecx, [rdx] ; ecx = *Type_t
XFGHelper::XFGTypeHasher::hash_indirection+17 mov eax, ecx
XFGHelper::XFGTypeHasher::hash_indirection+19 and eax, 10Fh
[...]
XFGHelper::XFGTypeHasher::hash_indirection+25 sub eax, 1 ; case 0x102 (general pointer):
XFGHelper::XFGTypeHasher::hash_indirection+28 jz short loc_1801058E3
[...]
XFGHelper::XFGTypeHasher::hash_indirection+2F cmp eax, 3 ; case 0x106 (function pointer):
XFGHelper::XFGTypeHasher::hash_indirection+32 jz short loc_1801058E3
[...]
XFGHelper::XFGTypeHasher::hash_indirection+6B loc_1801058E3:
XFGHelper::XFGTypeHasher::hash_indirection+6B mov dil, 2 ; will be written to std::vector
XFGHelper::XFGTypeHasher::hash_indirection+6E jmp short loc_1801058F6
[...]
XFGHelper::XFGTypeHasher::hash_indirection+7E loc_1801058F6:
XFGHelper::XFGTypeHasher::hash_indirection+7E mov rdx, [rsi+8] ; rdx = ptr to the Type_t referenced by the pointer
XFGHelper::XFGTypeHasher::hash_indirection+7E ; (return type in the case of functions)
XFGHelper::XFGTypeHasher::hash_indirection+82 mov rcx, rbx ; this
XFGHelper::XFGTypeHasher::hash_indirection+85 call XFGHelper::XFGHasher::add_type
XFGHelper::XFGTypeHasher::hash_indirection+8A mov rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+8E lea r9, [rsp+38h+arg_8+1]
XFGHelper::XFGTypeHasher::hash_indirection+93 lea r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+98 mov byte ptr [rsp+38h+arg_8], dil ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::hash_indirection+9D mov rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+A0 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
If it's a function object (identified by a Type_t value of 0x101, typically referenced by a function pointer with Type_t value of 0x106), it adds the hash of the function prototype by calling the XFGHelper::XFGHasher::add_function_type function, whose inner workings we have already dissected, plus the hash of the return type of the function, plus a byte with value 1.
XFGHelper::XFGTypeHasher::hash_indirection+15 mov ecx, [rdx] ; ecx = *Type_t
XFGHelper::XFGTypeHasher::hash_indirection+17 mov eax, ecx
XFGHelper::XFGTypeHasher::hash_indirection+19 and eax, 10Fh
XFGHelper::XFGTypeHasher::hash_indirection+1E sub eax, 101h ; case 0x101 (function):
XFGHelper::XFGTypeHasher::hash_indirection+23 jz short loc_1801058E8
[...]
XFGHelper::XFGTypeHasher::hash_indirection+70 xor r8d, r8d
XFGHelper::XFGTypeHasher::hash_indirection+73 mov rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+76 mov dil, 1 ; this is written to std::vector at the end of this function
XFGHelper::XFGTypeHasher::hash_indirection+79 call XFGHelper::XFGHasher::add_function_type(Type_t const *,XFGHelper::VirtualInfoFromDeclspec)
XFGHelper::XFGTypeHasher::hash_indirection+7E
XFGHelper::XFGTypeHasher::hash_indirection+7E loc_1801058F6:
XFGHelper::XFGTypeHasher::hash_indirection+7E ; XFGHelper::XFGTypeHasher::hash_indirection+6E↑j
XFGHelper::XFGTypeHasher::hash_indirection+7E mov rdx, [rsi+8] ; rdx = ptr to the Type_t referenced by the pointer
XFGHelper::XFGTypeHasher::hash_indirection+7E ; (return type in the case of functions)
XFGHelper::XFGTypeHasher::hash_indirection+82 mov rcx, rbx ; this
XFGHelper::XFGTypeHasher::hash_indirection+85 call XFGHelper::XFGHasher::add_type
XFGHelper::XFGTypeHasher::hash_indirection+8A mov rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+8E lea r9, [rsp+38h+arg_8+1]
XFGHelper::XFGTypeHasher::hash_indirection+93 lea r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+98 mov byte ptr [rsp+38h+arg_8], dil ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::hash_indirection+9D mov rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+A0 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Finally, if it's an array (identified by Type_t value 0x103), it writes a QWORD with the number of elements in the array, plus the hash of the type of the array elements, plus a single byte with value 6.
XFGHelper::XFGTypeHasher::hash_indirection+15 mov ecx, [rdx] ; ecx = *Type_t
XFGHelper::XFGTypeHasher::hash_indirection+17 mov eax, ecx
XFGHelper::XFGTypeHasher::hash_indirection+19 and eax, 10Fh
[...]
XFGHelper::XFGTypeHasher::hash_indirection+2A sub eax, 1 ; case 0x103 (array passed by pointer):
XFGHelper::XFGTypeHasher::hash_indirection+2D jz short loc_1801058B2
[...]
XFGHelper::XFGTypeHasher::hash_indirection+3A loc_1801058B2:
XFGHelper::XFGTypeHasher::hash_indirection+3A lea eax, [rcx-4103h]
XFGHelper::XFGTypeHasher::hash_indirection+40 mov dil, 6 ; will be written to std::vector
XFGHelper::XFGTypeHasher::hash_indirection+43 test eax, 0FFFFBFFFh
XFGHelper::XFGTypeHasher::hash_indirection+48 jz short loc_1801058AC
XFGHelper::XFGTypeHasher::hash_indirection+4A mov rax, [rdx+10h] ; rax = number of elems in array
XFGHelper::XFGTypeHasher::hash_indirection+4E lea r9, [rsp+38h+arg_10]
XFGHelper::XFGTypeHasher::hash_indirection+53 mov rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+57 lea r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+5C mov rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+5F mov [rsp+38h+arg_8], rax ; value to insert: number of elems in array (size = qword)
XFGHelper::XFGTypeHasher::hash_indirection+64 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
XFGHelper::XFGTypeHasher::hash_indirection+69 jmp short loc_1801058F6
[...]
XFGHelper::XFGTypeHasher::hash_indirection+7E loc_1801058F6
XFGHelper::XFGTypeHasher::hash_indirection+7E mov rdx, [rsi+8] ; rdx = ptr to the Type_t referenced by the pointer
XFGHelper::XFGTypeHasher::hash_indirection+7E ; (return type in the case of functions)
XFGHelper::XFGTypeHasher::hash_indirection+82 mov rcx, rbx ; this
XFGHelper::XFGTypeHasher::hash_indirection+85 call XFGHelper::XFGHasher::add_type
XFGHelper::XFGTypeHasher::hash_indirection+8A mov rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+8E lea r9, [rsp+38h+arg_8+1]
XFGHelper::XFGTypeHasher::hash_indirection+93 lea r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+98 mov byte ptr [rsp+38h+arg_8], dil ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::hash_indirection+9D mov rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+A0 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Hashing of union/struct/enum types
When dealing with unions/structs/enums, after writing a byte with value 2 to the std::vector, function XFGHelper::XFGTypeHasher::compute_hash calls XFGHelper::XFGTypeHasher::hash_tag, passing as argument in RDX a pointer to a Symbol_t object containing the human-readable name of the union/struct/enum type.
XFGHelper::XFGTypeHasher::compute_hash+AE mov rdx, [rdi+10h] ; struct Symbol_t *
XFGHelper::XFGTypeHasher::compute_hash+B2 mov rcx, rbx ; this
XFGHelper::XFGTypeHasher::compute_hash+B5 call XFGHelper::XFGTypeHasher::hash_tag(Symbol_t *)
XFGHelper::XFGTypeHasher::hash_tag calls XFGHelper::XFGHasher::add_string, which adds the name of the union/struct/enum to the std::vector (if the union/struct/enum is a named one). On the contrary, if the union/struct/enum is an anonymous one, it adds the string "<unnamed>" to the std::vector.
XFGHelper::XFGHasher::add_string public: void XFGHelper::XFGHasher::add_string(class Symbol_t *) proc near
XFGHelper::XFGHasher::add_string sub rsp, 38h
XFGHelper::XFGHasher::add_string+4 cmp byte ptr [rdx+11h], 4
XFGHelper::XFGHasher::add_string+8 jnz short loc_18010568B
XFGHelper::XFGHasher::add_string+A mov r8, [rdx]
XFGHelper::XFGHasher::add_string+D mov eax, [r8+10h]
XFGHelper::XFGHasher::add_string+11 shr eax, 16h
XFGHelper::XFGHasher::add_string+14 test al, 1 ; union/struct/enum is named?
XFGHelper::XFGHasher::add_string+16 jz short loc_180105674
XFGHelper::XFGHasher::add_string+18 lea r9, aUnnamed+9 ; ""
XFGHelper::XFGHasher::add_string+1F lea r8, aUnnamed ; "<unnamed>"
XFGHelper::XFGHasher::add_string+26
XFGHelper::XFGHasher::add_string+26 loc_180105666:
XFGHelper::XFGHasher::add_string+26 mov rdx, [rcx+8]
XFGHelper::XFGHasher::add_string+2A call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
XFGHelper::XFGHasher::add_string+2F add rsp, 38h
XFGHelper::XFGHasher::add_string+33 retn
XFGHelper::XFGHasher::add_string+34 ; ---------------------------------------------------------------------------
XFGHelper::XFGHasher::add_string+34
XFGHelper::XFGHasher::add_string+34 loc_180105674:
XFGHelper::XFGHasher::add_string+34 mov r8, [r8+8] ; r8 = union/struct/enum name
XFGHelper::XFGHasher::add_string+38 or r9, 0FFFFFFFFFFFFFFFFh
XFGHelper::XFGHasher::add_string+3C
XFGHelper::XFGHasher::add_string+3C loc_18010567C:
XFGHelper::XFGHasher::add_string+3C inc r9
XFGHelper::XFGHasher::add_string+3F cmp byte ptr [r8+r9], 0
XFGHelper::XFGHasher::add_string+44 jnz short loc_18010567C
XFGHelper::XFGHasher::add_string+46 add r9, r8 ; r9 points to end of string
XFGHelper::XFGHasher::add_string+49 jmp short loc_180105666
After that, there's a code branch in function XFGHelper::XFGTypeHasher::hash_tag that can add the string "<local>" to the data to be hashed under some condition. I didn't investigate much into this, but it probably handles the case of locally-scoped unions/structs/enums.
XFGHelper::XFGTypeHasher::hash_tag+4D mov rbx, [rbx+18h]
XFGHelper::XFGTypeHasher::hash_tag+51 test rbx, rbx
XFGHelper::XFGTypeHasher::hash_tag+54 jnz short loc_180105A16
XFGHelper::XFGTypeHasher::hash_tag+56 jmp short loc_180105A76
XFGHelper::XFGTypeHasher::hash_tag+58 ; ---------------------------------------------------------------------------
XFGHelper::XFGTypeHasher::hash_tag+58
XFGHelper::XFGTypeHasher::hash_tag+58 loc_180105A5C:
XFGHelper::XFGTypeHasher::hash_tag+58 mov rdx, [rdi+8]
XFGHelper::XFGTypeHasher::hash_tag+5C lea r9, aLocal+7 ; ""
XFGHelper::XFGTypeHasher::hash_tag+63 lea r8, aLocal ; "<local>"
XFGHelper::XFGTypeHasher::hash_tag+6A mov rcx, rdi
XFGHelper::XFGTypeHasher::hash_tag+6D call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Hashing of primitive types
When handling primitive types (those who don't have 0x100, 0x200 nor 0x400 set in its Type_t value), after writing a byte with value 1 to the std::vector, function XFGHelper::XFGTypeHasher::compute_hash calls XFGHelper::XFGTypeHasher::hash_primitive.
XFGHelper::XFGTypeHasher::hash_primitive is basically a big switch statement, mapping Type_t values to a different set of constants representing primitive types. The resulting constant (a single byte) is then added to the std::vector. For example, for the float type, represented by Type_t 0x26, this function adds a byte with value 0x0B to the std::vector.
XFGHelper::XFGTypeHasher::hash_primitive private: void XFGHelper::XFGTypeHasher::hash_primitive(class Type_t const *) proc near
XFGHelper::XFGTypeHasher::hash_primitive sub rsp, 38h
XFGHelper::XFGTypeHasher::hash_primitive+4 mov eax, [rdx]
XFGHelper::XFGTypeHasher::hash_primitive+6 mov r10, rcx
XFGHelper::XFGTypeHasher::hash_primitive+9 and eax, 1FFFh
XFGHelper::XFGTypeHasher::hash_primitive+E cmp eax, 40h ; '@'
XFGHelper::XFGTypeHasher::hash_primitive+11 ja loc_1801059D4
XFGHelper::XFGTypeHasher::hash_primitive+17 jz loc_1801059D0 ; case 0x40:
XFGHelper::XFGTypeHasher::hash_primitive+1D cmp eax, 1Ah
XFGHelper::XFGTypeHasher::hash_primitive+20 ja short loc_18010599E
[...]
XFGHelper::XFGTypeHasher::hash_primitive+6E loc_18010599E:
XFGHelper::XFGTypeHasher::hash_primitive+6E sub eax, 1Bh ; case 0x1B:
XFGHelper::XFGTypeHasher::hash_primitive+71 jz short loc_1801059CC
XFGHelper::XFGTypeHasher::hash_primitive+73 sub eax, 1 ; case 0x1C:
XFGHelper::XFGTypeHasher::hash_primitive+76 jz short loc_1801059C8
XFGHelper::XFGTypeHasher::hash_primitive+78 sub eax, 2 ; case 0x1E:
XFGHelper::XFGTypeHasher::hash_primitive+7B jz short loc_1801059C4
XFGHelper::XFGTypeHasher::hash_primitive+7D sub eax, 8 ; case 0x26 (float):
XFGHelper::XFGTypeHasher::hash_primitive+80 jz short loc_1801059C0
[...]
XFGHelper::XFGTypeHasher::hash_primitive+90 loc_1801059C0:
XFGHelper::XFGTypeHasher::hash_primitive+90 mov cl, 0Bh ; primitive_type = 0xB (float)
XFGHelper::XFGTypeHasher::hash_primitive+92 jmp short loc_1801059DE
[...]
XFGHelper::XFGTypeHasher::hash_primitive+AE loc_1801059DE:
XFGHelper::XFGTypeHasher::hash_primitive+AE mov rdx, [r10+8]
XFGHelper::XFGTypeHasher::hash_primitive+B2 lea r9, [rsp+38h+arg_9]
XFGHelper::XFGTypeHasher::hash_primitive+B7 mov [rsp+38h+arg_8], cl ; value to add: primitive_type
XFGHelper::XFGTypeHasher::hash_primitive+BB lea r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_primitive+C0 mov rcx, r10
XFGHelper::XFGTypeHasher::hash_primitive+C3 call std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
Final transformations to the hash
So far we have described in depth how the C compiler front end calculates the hash of a function prototype for XFG purposes. If we had to summarize it with some Python-like pseudo-code, we could say that the hash of a function is built this way:
hash = sha256(number_of_params +
type_hash(params[0]) +
type_hash(params[...]) +
type_hash(params[n]) +
is_variadic +
calling_convention +
type_hash(return_type)
)[0:8]
XFG function hashes are a truncated version of a SHA256 digest (only the first 8 bytes are kept), and so their collision resistance is reduced compared to a full SHA256 hash, but we could expect different XFG hashes to reasonably keep the avalanche effect of hashing functions and look unrelated, right?
However, if you inspect a set of XFG hashes on a given binary (I picked ntdll.dll), you'll notice that they definitely don't seem to have 64 bits of entropy:
function 0x180001a30 -> prototype hash: 0x8d952e0d365aa071
function 0x180001b50 -> prototype hash: 0xe2198f4a3c515871
function 0x180001dc0 -> prototype hash: 0xbeac2e06165fc871
function 0x180001de0 -> prototype hash: 0xfaec0e7f70d92371
function 0x180001fc0 -> prototype hash: 0xc5d11eb750d75871
function 0x180002030 -> prototype hash: 0xe8bcaf9a10586871
function 0x180002040 -> prototype hash: 0xc3110f087e584871
function 0x1800020b0 -> prototype hash: 0xdbc1261858d2f871
function 0x1800023a0 -> prototype hash: 0xda690f3e36531a71
The reason behind this is that the truncated SHA256 hashes produced by the compiler front end (c1.dll) receive a final transformation by the compiler back end (c2.dll) before being actually written to the resulting object file. To be precise, the XfgIlVisitor::visit_I_XFG_HASH function in c2.dll applies two masks to the truncated SHA256 hashes:
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+5B mov rcx, 8000060010500070h
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+65 mov r13, 0FFFDBFFF7EDFFB70h
[...]
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+E9 mov rdx, [rax] ; rdx = 8 bytes of SHA256 hash
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+EC add rax, 8
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F0 and rdx, r13 ; hash &= 0FFFDBFFF7EDFFB70h
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F3 mov [rbx], rax
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F6 or rdx, rcx ; hash |= 8000060010500070h
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F9 mov ecx, r9d ; this
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+FC call XFG::TiSetHash(ulong,unsigned __int64,tagMOD *)
That is the reason why XFG hashes don't look completely random, despite being based on SHA256. I don't know why these masks are applied, though.
A hands-on hash calculation exercise
To verify that we have properly understood how XFG hashes are generated, let's try to calculate an XFG hash by hand. Let's say that we want to calculate the hash for a function with the following prototype:
void *memcpy(
void *dest,
const void *src,
size_t count
);
We need to find out the 5 pieces of data that compose a function prototype:
number of parameters;
type hash for each parameter;
is it a variadic function or not?;
calling convention;
type hash of the return type.
Components 1, 3 and 4 are trivial:
number of parameters -> DWORD with value 3;
is it a variadic function? -> byte with value 0;
calling convention -> default (DWORD with value 0x201 & 0xF == 0x1).
So let's compute the more complex parts: the type hash of each parameter, and the type hash of the return type.
Type hash of parameter 1
The type of the first parameter is void *. That type is represented by a Type_t with the following content:
00000102 00000200 [+ pointer to referenced Type_t]
We need to find out the 3 pieces of data to produce a type hash:
type qualifiers -> byte with value 0;
type group: it is a pointer -> byte with value 3;
type-specific data: it's a "general" pointer -> hash of referenced type (we have recursion here) + byte with value 2.
For the recursive calculation of the hash of the referenced type (void), the type is represented by a Type_t with the following contents:
00000040 00000000
The data we need is built as follows:
type qualifiers -> byte with value 0;
type group: it is a primitive type -> byte with value 1;
type-specific data: for Type_t 0x40 (void), XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x0E.
Type hash of parameter 2
The type of the second parameter is const void *. That type is represented by a Type_t with the following contents:
00000102 00000200 [+ pointer to referenced Type_t]
The data we need is built as follows:
type qualifiers -> byte with value 0;
type group: it is a pointer -> byte with value 3;
type-specific data: it's a "general" pointer -> hash of referenced type (we have recursion here) + byte with value 2.
For the recursive calculation of the hash of the referenced type (const void), the type is represented by a Type_t with the following contents:
00000040 00000800
The data we need is built as follows:
type qualifiers: it has the const qualifier -> encoded as a byte with value 1;
type group: it is a primitive type -> byte with value 1;
type-specific data: for Type_t 0x40 (void) -> XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x0E.
Type hash of parameter 3
The type of the thid parameter is size_t. That type is represented by a Type_t with the following contents:
00004019 00000000
The data we need is built as follows:
type qualifiers -> byte with value 0;
type group: it is a primitive type -> byte with value 1;
type-specific data: for Type_t 0x4019 (unsigned long long) -> XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x88.
Type hash of return type
The return type is void *, same as the first parameter of the function, so here we just repeat what we obtained before.
type qualifiers -> byte with value 0;
type group: it is a pointer -> byte with value 3;
type-specific data: it's a "general" pointer -> hash of referenced type (we have recursion here) + byte with value 2.
For the recursive calculation of the hash of the referenced type (void):
type qualifiers -> byte with value 0;
type group: it is a primitive type -> byte with value 1;
type-specific data: for Type_t 0x40 (void), XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x0E.
Putting everything together
Let's assemble all the data together:
# Number of params
03 00 00 00
# type hash of param 1 (void *)
SHA256(
00 #qualifiers
03 # type group: pointer
# type hash of referenced type (void)
SHA256(
00 # qualifiers
01 # type group: primitive type
0E # hash of primitive type: void -> 0x0E
)[0:8]
02 # regular pointer
)[0:8]
# type hash of param 2 (const void *)
SHA256(
00 # qualifiers
03 # type group: pointer
# type hash of referenced type (const void)
SHA256(
01 # qualifiers: const
01 # type group: primitive type
0E # hash of primitive type: void -> 0x0E
)[0:8]
02 # regular pointer
)[0:8]
# type hash of param 3 (size_t)
SHA256(
00 # qualifiers
01 # type group: primitive type
88 # hash of primitive type: unsigned long long -> 0x88
)[0:8]
# is variadic
00
# calling convention
01 00 00 00
# type hash of return value (void *)
SHA256(
00 # qualifiers
03 # type group: pointer
# type hash of referenced type (void)
SHA256(
00 # qualifiers
01 # type group: primitive type
0E # hash of primitive type: void -> 0x0E
)[0:8]
02 # regular pointer
)[0:8]
The following Python code obtains the SHA256 digest of that data, and truncates it to its first 8 bytes to obtain a hash identical to the one emitted by the compiler front end. Finally, it applies the two masks of the compiler back end to obtain the XFG hash in its ultimate form:
import struct
import hashlib
def truncated_hash(data):
return hashlib.sha256(data).digest()[0:8]
def apply_backend_masks(hash):
hash = hash & 0xFFFDBFFF7EDFFB70
hash = hash | 0x8000060010500070
return hash
def main():
# number of params
data = struct.pack('<L', 3)
# type hash of first param (void *)
data += truncated_hash(b'\x00\x03' + truncated_hash(b'\x00\x01\x0e') + b'\x02')
# type hash of second param (const void *)
data += truncated_hash(b'\x00\x03' + truncated_hash(b'\x01\x01\x0e') + b'\x02')
# type hash of third param (size_t)
data += truncated_hash(b'\x00\x01\x88')
# is variadic
data += struct.pack('<B', 0x0)
# calling convention (default)
data += struct.pack('<L', 0x201 & 0x0F)
# type hash of return type (void *)
data += truncated_hash(b'\x00\x03' + truncated_hash(b'\x00\x01\x0e') + b'\x02')
print(f'Data to be hashed: {data} ({len(data)} bytes)')
frontend_hash = struct.unpack('<Q', truncated_hash(data))[0]
print(f'Hash generated by the frontend: 0x{frontend_hash:x}')
final_hash = apply_backend_masks(frontend_hash)
print(f'[*] Final XFG hash: 0x{final_hash:x}')
The output of that Python code is the following:
> python test.py
Data to be hashed: b'\x03\x00\x00\x00\xf5\x97x>[J`\xb0\x17\x80\xb8\xc0[\x1b\xd0\xd8#\x14\xb4\xba\x91\xc7\xf6j\x00\x01\x00\x00\x00\xf5\x97x>[J`\xb0' (41 bytes)
Hash generated by the frontend: 0x1da7d393d6b63a72
[*] Final XFG hash: 0x9da5979356d63a70
If we compile some code using a function pointer to call a function whose prototype matches the one that we have been discussing in this section, we can see that the XFG hash we calculated by hand perfectly matches the one generated by MSVC (see the value assigned to register R10 at main+0x8E in the disassembly below):
main+1C lea rax, my_memcpy
main+23 mov [rsp+78h+var_50], rax
[...]
main+6A lea rcx, aCallingFunctio ; "Calling function pointer...\n"
main+71 call printf
main+76 lea rcx, Str ; "a test"
main+7D call strlen
main+82 cdqe
main+84 mov rcx, [rsp+78h+var_50]
main+89 mov [rsp+78h+var_48], rcx
main+8E mov r10, 9DA5979356D63A70h
main+98 mov r8, rax
main+9B lea rdx, aATest_0 ; "a test"
main+A2 lea rcx, [rsp+78h+var_28]
main+A7 mov rax, [rsp+78h+var_48]
main+AC call cs:__guard_xfg_dispatch_icall_fptr
Conclusions
In this blog post I wanted to share all the details of how the MSVC compiler generates XFG hashes for C programs. Besides exploring the details of an upcoming exploit mitigation, the topic allows to dig a little bit into compiler internals.
Please have in mind that, for now, XFG is only found on Windows Insider Preview builds, so what we have described here may be subject to changes before this CFI solution makes it into an official release of Windows 10.
Some questions remain unanswered for now, such as why the compiler back end applies two bit masks to the hashes generated by the front end, and why the hash is stored with the bit 0 set before the function start, but kept with the bit 0 unset in the XFG-instrumented call site.
Finally, it would be interesting to see what are the differences in the way the C++ compiler front end (c1xx.dll) computes XFG hashes. A quick look at this binary suggests that the hashing algorithm looks quite similar to the one used for the C language, but it will likely be adapted to take C++ concepts such as inheritance and C++ type qualifiers and modifiers into account.