C++ is well-known to be tedious to analyze, the use of both inheritance and polymorphism (i.e. virtual method) makes the compiler generate indirect calls. Usually, this kind of assembly code forces the reverse engineer to execute the code in order to figure out the destination of a call. In fact, we are looking for the VFT (Virtual Function Table). This table contains all virtual methods for a specific instance of a class. This article shows how to retrieve this information to make the analysis of a C++ software easier.

What's RTTI?

C++ allows the programmer to do introspection on an instance of class. Even if this feature is very limited, it can give the name of a class, the VFT location and its hierarchy. This information is stored in the RTTI, which stands for RunTime Type Information. Introspection is required to use dynamic_cast, typeid and the exception dispatcher.

dynamic_cast: this keyword is able to perform a checked downcast. In other words, the relation between two classes is checked at runtime. If this check is valid, it performs basically the same operations as a static_cast.

#include <iostream>

class Animal
{
  public:
    // Note the virtual method here. It is necessary so that Animal will have RTTI informations.
    virtual ~Animal(void) {}
};
class Cat : public Animal {};
class Dog : public Animal {};

int main(void)
{
  Animal* pAnimal = new Cat;

  std::cout
    << "pAnimal is a cat? "
    << (dynamic_cast<Cat *>(pAnimal) ? "true" : "false")
    << std::endl;

  std::cout
    << "pAnimal is a dog? "
    << (dynamic_cast<Dog *>(pAnimal) ? "true" : "false")
    << std::endl;

  delete pAnimal;

  return 0;
}
pAnimal is a cat? true
pAnimal is a dog? false

Note, if we perform a downcasting using reference instead of pointer, it will raise a bad_cast exception if there's no relation.

typeid: this keyword can give the name of an instance, through the std::type_info. It also gives a sorting between types, allowing structures like std::map<std::type_info, Object>. The standard specifies that this information is compiler-specific, We need to be careful when using it.

#include <iostream>

class MyClass {};
class MyDerivedClass : public MyClass {};

template<typename T> void PrintTypeInformation(char const* pVarName, T const& t)
{
  std::cout
    << pVarName
    << ": type: "     << typeid(t).name()
    << ", raw type: " << typeid(t).raw_name()
    << std::endl;
}

int main(void)
{
  MyClass cls;
  MyDerivedClass *drv;
  int n;
  char c;
  __int64 l;
  double d;

#define PRINT_TYPE_INFORMATION(var) PrintTypeInformation(#var, var)

  PRINT_TYPE_INFORMATION(cls);
  PRINT_TYPE_INFORMATION(drv);
  PRINT_TYPE_INFORMATION(n);
  PRINT_TYPE_INFORMATION(c);
  PRINT_TYPE_INFORMATION(&l);
  PRINT_TYPE_INFORMATION(d);

  return 0;
}

cls: type: class MyClass, raw type: .?AVMyClass@@
drv: type: class MyDerivedClass *, raw type: .PAVMyDerivedClass@@
n: type: int, raw type: .H
c: type: char, raw type: .D
&l: type: __int64 *, raw type: .PA_J
d: type: double, raw type: .N

Note: the raw name is in fact the mangled name.

Exception dispatcher: this feature enables to catch a sub-class using a common class.

#include <iostream>
#include <string>
#include <algorithm>
#include <Windows.h>

class MyException
{
  public:
    MyException(std::string const& rMsg) : m_Msg(rMsg) {}
    std::string GetMessage(void) const { return m_Msg; }
  private:
    std::string m_Msg;
};

class MyWindowsException : public MyException
{
  public:
    MyWindowsException(DWORD ErrorCode) : MyException(ConvertErrorCodeToString(ErrorCode)) {}
    static std::string ConvertErrorCodeToString(DWORD ErrorCode)
    {
      HLOCAL hLocal = nullptr;
      if (FormatMessageA(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_FROM_HMODULE | FORMAT_MESSAGE_ALLOCATE_BUFFER,
            LoadLibraryA("ntdll.dll"), ErrorCode, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), reinterpret_cast<LPSTR>(&hLocal), 0x0, nullptr) == 0x0)
        return "<unknown error>";
      std::string Result = reinterpret_cast<LPCSTR>(LocalLock(hLocal));
      LocalFree(hLocal);
      return Result;
    }
};

int main(void)
{
  try
  {
    throw MyWindowsException(STATUS_FLOAT_OVERFLOW);
  }

  catch (MyException const& rExcept)
  {
    std::cerr << "Exception caught! "" << rExcept.GetMessage() << """ << std::endl;
  }
  return 0;
}

Result:

Exception caught! "{EXCEPTION}
Floating-point overflow.
"

These features can make a C++ software hard to understand. The good news is that, in order to work, these features require lot of metadata to be stored in the executable. The next part describes the internal structures used to store them.

Structures defintions

It's dangerous to go alone, take this:

RTTI layout

A big picture of RTTI layout in memory

RTTI structures are hardcoded in the Visual C++ compiler, here are the most important ones:

This structure is very important to identify an object since it contains its VFT (field pVFTable) and its mangled name. That's why it usually starts with ".?AV", which means "a C++ class". These structures are stored in the section ".data".

typedef const struct _s__RTTICompleteObjectLocator {
  unsigned long signature;
  unsigned long offset;
  unsigned long cdOffset;
  _TypeDescriptor *pTypeDescriptor;
  __RTTIClassHierarchyDescriptor *pClassDescriptor;
} __RTTICompleteObjectLocator;

This structure is located at VFT - sizeof(void*), the field pClassDescriptor is the next step to retrieve the hierarchy of the class.

typedef const struct _s__RTTIClassHierarchyDescriptor {
  unsigned long signature;
  unsigned long attributes;
  unsigned long numBaseClasses;
  __RTTIBaseClassArray *pBaseClassArray;
} __RTTIClassHierarchyDescriptor;

This structure gives the count and pointer to sub-classes.

#pragma warning (disable:4200)
typedef const struct _s__RTTIBaseClassArray {
  __RTTIBaseClassDescriptor *arrayOfBaseClassDescriptors [];
} __RTTIBaseClassArray;
#pragma warning (default:4200)

This structures tells where are located sub-classes.

typedef const struct _s__RTTIBaseClassDescriptor {
  _TypeDescriptor *pTypeDescriptor;
  unsigned long numContainedBases;
  _PMD where;
  unsigned long attributes;
} __RTTIBaseClassDescriptor;

Finally, this structure allows us to retrieve the _TypeDescriptor structure which gives us for instance the name of the class.

How to retrieve classes hierarchy from structures

There're multiple ways to retrieve this information. You can, for instance: * scan this assembly pattern "mov [ecx/esi], offset class_vft" for the initialization of a VFT in the constructor, * scan for an array of pointer to method, * put your ideas in the comment section.

We decided to do pattern matching on ".?AV" to get the field name of _TypeInformation and thus retrieves the RTTICompleteObjectLocator.

Example with an idapython script

This idapython script is able to retrieve name of classes along with their hierarchy and their VFT. It doesn't rename anything, so feel free to improve it with your particular needs. You can download it here

Acknowledge

Adrien Guinet


If you would like to learn more about our security audits and explore how we can help you, get in touch with us!