Xen exploitation part 1: XSA-105, from nobody to root

This blog post describes the exploitation of Xen Security Advisory 105 (XSA-105) [1] (CVE-2014-7155). This post explains the environment setup and shows the development of a fully working exploit on Linux 4.4.5.

We are not aware of any public exploit for this vulnerability, although Andrei Lutas wrote excellent articles [2] [3] [4] describing the root cause of the vulnerability and how to trigger it. This post explains the environment setup and shows the development of a fully working exploit on Linux 4.4.5 (it probably works with many others versions).

tl;dr

Environment

Xen versions from at least 3.2.x were vulnerable. We chose to exploit the XSA-105 on the last non patched version: 4.1.6.1. The vulnerability allows privilege escalation on Hardware Virtualized Machines (HVM [5]). According to Xen terminology, HVM guests are fully virtualized guests using virtualization extensions such as Intel VT or AMD-V. The other main type of Xen guest are paravirtualized (PV). Basically, the kernel detects Xen at boot time if it runs under an hypervisor, and uses paravirtualization extensions called pvops [6] in this case to be more efficient. Because the default compilation options of the Linux kernel enable pvops, we compiled a custom kernel with these extensions disabled.

Dom0

Dom0 [7] [8] is the initial virtual machine launched by Xen on boot. This is the most privileged guest and allows Xen administration. We chose to install Dom0 as a classical Debian 7.9.0.

Xen compilation asks for some dependencies:

apt-get build-dep xen

Once the dependencies are installed, compiling and installing Xen should be straightforward:

make install -j4 && update-grub

Reboot Dom0 and choose the right entry in grub. In order to use the xl command, the linker's path must be adjusted. Also, the xencommons service needs to run:

echo "/usr/lib64" >> /etc/ld.so.conf
insserv xencommons
service xencommons start

DomU

The second step requires the creation of an HVM guest. As the pvops are enabled by default, most Linux distributions should be in fact paravirtualized. We chose to use an Archlinux base installation [9] as HVM guest. The compilation option to deactivate the pvops is simply in Processor type and features -> Linux guest support (CONFIG_HYPERVISOR_GUEST=n).

The xl configuration file used to launch the HVM DomU is straightforward. Let's take an HVM guest with 2 CPUs or more (since it's required for the exploitation of the vulnerability). In our case, the guest is loaded from a qcow image which has been directly created on a physical machine for better performances. The network interface is a bridge with the hypervisor network, which allows us to communicate with the guest through SSH.

kernel = '/usr/lib/xen/boot/hvmloader'
builder='hvm'
memory = 1024
name = 'hvm_arch'
vcpus = 2
vif = ['bridge=xenbr0']
disk = ['tap:qcow:/root/VM2.img.qcow,xvda,w']
device_model_version = 'qemu-xen-traditional'
sdl=0
serial='pty'
vnc=1
vnclisten="0.0.0.0"
vncpasswd=""

XSA-105

Description

The vulnerability is located in the emulation of hlt, lgdt, lidt and lmsw instructions [1]:

ISSUE DESCRIPTION
=================
The emulation of the instructions HLT, LGDT, LIDT, and LMSW fails to
perform supervisor mode permission checks.

However these instructions are not usually handled by the emulator.
Exceptions to this are
- - when the instruction's memory operand (if any) lives in (emulated or
  passed through) memory mapped IO space,
- - in the case of guests running in 32-bit PAE mode, when such an
  instruction is (in execution flow) within four instructions of one
  doing a page table update,
- - when an Invalid Opcode exception gets raised by a guest instruction,
  and the guest then (likely maliciously) alters the instruction to
  become one of the affected ones.

Malicious guest user mode code may be able to leverage this to install
e.g. its own Interrupt Descriptor Table (IDT).

We learn two things from the above text:

  • the code emulating hlt, lgdt, lidt and lmsw does not perform supervisor mode permission checks. Hence a possibility for a non privileged code (ring 3) to run these instructions
  • these instructions are not usually emulated, it means that we have to find a way to emulate them. The third proposition of the advisory seems to be the easiest to implement and this is the solution adopted by Andrei Lutas [2].

Lutas has already provided a remarkable job explaining the vulnerable code in his paper [2], if you need explanations about those, definitely read his paper.

Exploitation

Among the vulnerable instructions, only two of them could lead to a potential privilege escalation: lgdt and lidt. They respectively allow to change the value of the Global Descriptor Table Register and Interrupt Descriptor Table Register. Both GDTR and IDTR have the same format: the upper bits contain the base address and the lower bits define the limit [10]. These values define the Global Descriptor Table (GDT) and the Interrupt Descriptor Table (IDT) addresses.

According to Intel manuals, a non privileged code is not allowed to execute these instructions. If a user is able to load his own GDT or IDT, this can lead to an arbitrary code execution and a privilege escalation. Let's see how.

Interrupt Descriptor Table (IDT)

The IDT is the x86 interrupt vector table [10]. It is a basic table that associates an interrupt number with an interrupt handler. The entry number determines the interrupt number and each entry contains some fields such as: a type, a segment selector, an offset, a privilege level, etc. The interrupt handler address is determined by adding the segment base (determined with the segment selector) and the offset.

If a user is able to load his own IDT, he can specify a malicious entry which links an interrupt to his own handler using kernel code segment selector. In order to avoid stability issues, the interrupt must be fowarded to the original handler. This can be done because the handler runs in kernel space, and it can read entries from the previous IDT. This IDT must have been previously saved using the sidt instruction because it must be restored before returning to user space. However, we have not tested it.

We chose to use the GDT approach, despite the IDT solution adopted by Andrei Lutas [2].

Global Descriptor Table (GDT)

The GDT is used to define memory segments. Each entry contains: a base, a limit, a type, a Descriptor Privilege Level (DPL), read/write bit, and so on:

struct desc_struct {
        union {
                struct {
                        unsigned int a;
                        unsigned int b;
                };
                struct {
                        unsigned short limit0;
                        unsigned short base0;
                        unsigned int base1: 8, type: 4, s: 1, dpl: 2, p: 1;
                        unsigned int limit: 4, avl: 1, l: 1, d: 1, g: 1, base2: 8;
                };
        };
} __attribute__((packed));

Nowadays, the most used memory segmentation pattern is a flat model. Each descriptor maps the whole memory but with differents privileges and flags (all security checks are performed with paging). Most of the time there are at least six GDT entries:

  • 32-bit kernel code segment (dpl = 0)
  • 64-bit kernel code segment (dpl = 0)
  • kernel data segment (dpl = 0)
  • 32-bit user code segment (dpl = 3)
  • 64-bit user code segment (dpl = 3)
  • user data segment (dpl = 3)

The current memory segments are specified in the segment registers. There are several segment registers: code selector, stack selector, data selector, etc. Each segment selector is 16-bit long. Bits 3 through 15 are an index in the GDT, bit 2 is the LDT/GDT selector, bit 0 and 1 are the Requested Segment Privilege (RPL).

There is another kind of entry which is very interesting in our case: call gate entry. The aim of a call gate is to facilitate the transfer between different privilege levels. Such entries are twice larger than memory descriptors (in 64-bit mode) and have others fields:

  • a segment selector
  • an offset in the selected segment
  • a DPL

To access a call gate, the user has to perform a far call. The far call must specify the call selector. This selector has exactly the same format as any selector (index in the GDT, LDT/GDT selector, requested privilege). Then the CPU takes the segment selector specified in the call gate entry, takes the base of this segment, add the call gate offset and reaches procedure entry point.

Of course, there are some privilege checks and four levels of privileges are involved:

  • the current privilege level (CPL)
  • the requested privilege level in the far call selector (RPL)
  • the call gate descriptor privilege level (CDPL)
  • the segment descriptor privilege level (SDPL)

Three conditions must be satisfied:

  • CPL <= CDPL
  • RPL <= CDPL
  • SDPL <= CPL

If these conditions are satisfied the call gate procedure is executed. The idea is to create a call gate with a DPL set to 3, a segment selector pointing to the kernel code segment, and procedure giving us supervisor privileges. Then:

  • CPL = 3
  • RPL = 0
  • CDPL = 3
  • SDPL = 0
  • CPL <= CDPL == True
  • RPL <= CDPL == True
  • SDPL <= CPL == True

Putting it all together

The exploitation process is:

  1. craft a custom GDT containing a flat segmentation model and a call gate with DPL = 3
  2. save the current GDTR
  3. create 2 threads waiting for each other (just for synchronization)
  4. the first one performs an ud2 instruction while the second one patches the ud2 instruction with a lgdt [rbx] instruction (see Lutas' paper for more details [2])
  5. if we are no too slow, the emulation of lgdt [rbx] should occur
  6. far call
  7. #

The far call routine first reloads the old GDTR and then performs a simple commit_creds(prepare_kernel_cred(0));. This routine must perform a swapgs before calling any kernel function and before returning to user space. Exiting the far call routine is done with a retf instruction.

A demonstration is available here: asciinema, and the full exploit can be downloaded here: xsa105_exploit.tar.gz.

Conclusion

This vulnerability allows a privilege escalation on a HVM guest with more than one CPU. We use the call gate mechanism in order to execute arbitrary code while Andrei Lutas [2] exploited an interrupt handler. Since it's just a PoC, some requirements must be met. SMEP shouldn't be enabled in the guest because the call gate handler is in user memory space. Also, the payload calls commit_creds(prepare_kernel_cred(0)); to gain root privileges. If kptr_restrict, we will not be able to retrieve the functions' addresses through /proc/kallsyms.

It was the first time for me dealing with Xen, it was very interesting and I encourage anyone interested in it to do the same: take an advisory and write an exploit, make profit or publish it ;)

A future blog post will talk about a guest-to-host escape, with a working exploit. Stay tuned!

[1](1, 2) http://xenbits.xen.org/xsa/advisory-105.html
[2](1, 2, 3, 4, 5, 6) https://labs.bitdefender.com/wp-content/uploads/downloads/2014/10/Gaining-kernel-privileges-using-the-Xen-emulator.pdf
[3][dead] https://www.cert-ro.eu/files/doc/896_20141104131145076318500_X.pdf
[4]https://labs.bitdefender.com/2014/10/from-ring3-to-ring0-xen-emulator-flaws/
[5]http://wiki.xen.org/wiki/Xen_Project_Software_Overview#Guest_Types
[6]http://wiki.xenproject.org/wiki/XenParavirtOps
[7]http://wiki.xen.org/wiki/Dom0
[8]http://wiki.xen.org/wiki/Dom0_Kernels_for_Xen
[9]https://wiki.archlinux.org/index.php/installation_guide
[10](1, 2) http://download.intel.com/design/processor/manuals/253668.pdf

Comments