This article presents the structure of the Independent Guest Virtual Machine (IGVM) file format, a binary file designed to define and securely launch the initial state of a virtual machine. It bundles all necessary components such as the BIOS/OVMF, kernel, and initial ramdisk, into a single file. We'll focus on a concrete example to understand the main structure of the file format.


Introduction

In this article, we will dive into the Independent Guest Virtual Machine (IGVM) file format. The main objective here is not to provide an exhaustive description, but rather to focus on the main structures of IGVM files by illustrating them with a concrete example from the OpenHCL project, namely openhcl.bin. This file is the firmware image for the OpenHCL paravisor. One can either build it from source or download a pre-built binary — refer to the OpenVMM Guide for instructions. For curious readers, the complete source of the format is available on the microsoft/igvm GitHub repository.

According to Microsoft, this file format is designed to encapsulate all information required to launch a virtual machine on any given virtualization stack, with support for different isolation technologies such as AMD SEV-SNP and Intel TDX.

In addition to providing this abstraction layer, IGVM files offer a key component that can be used for Confidential Computing called measurement. Indeed, these days more and more companies rely on the cloud. It is therefore essential to ensure the confidentiality of companies data for cloud providers. This must be guaranteed both horizontally, that is, between 2 VMs, and vertically, between a VM and the hypervisor.

Basically, the idea behind Confidential Computing is to protect data while it is being used. In order to do this, data and algorithms must be placed in a Trusted Execution Environment (TEE). These hardware-based environments are called SEV-SNP for AMD and TDX for Intel.

IGVM uses measurement to guarantee confidentiality. The idea behind measurement is to "measure" (using cryptography) the state of the VM. This can therefore be used to ensure the integrity of the VM, as well as its confidentiality when coupled with the hardware isolation technology (SEV-SNP, TDX).

The IGVM format can be split into 3 parts. First, the Fixed Header which contains metadata about the file itself. Second, the Variable Headers, a set of headers describing how the data must be parsed and used. Third, the rest of the file, which contains the raw data.

The rest of the article will first describe the Fixed Header. Then, it will present the organization of the Variables Headers in order to understand how the environment is set up and how the Data is processed and used. Finally, the third and final part will focus on playing with the data by introducing the microsoft/igvm library through a small example and some hints.

Fixed Header

The Fixed Header contains metadata information about the file.

The image above shows a Fixed Header Version 1. It represents the following structure:

pub struct IGVM_FIXED_HEADER {
    pub magic: u32,
    pub format_version: u32,
    pub variable_header_offset: u32,
    pub variable_header_size: u32,
    pub total_file_size: u32,
    pub checksum: u32,
}

The Fixed Header Version 2 simply appends 2 new fields:

pub struct IGVM_FIXED_HEADER_V2 {
    // [...]
    pub checksum: u32,
    pub architecture: IgvmArchitecture, // <--- Here
    pub page_size: u32, // <--- Here
}

pub enum IgvmArchitecture {
    X64 = 0x0,
    AARCH64 = 0x1,
}

As the reader may notice, the IGVM_FIXED_HEADER_V2 introduces the page_size field. By default, for V1 the size of a page is 0x1000 (4096).

The variable_header_offset and variable_header_size are used in the next part to parse the Variable Headers. The size is in bytes because the various Variable Headers do not necessarily have the same size.

Finally, the checksum is a CRC32 of the Fixed Header and Variable Headers. During the computation the checksum field is set to 0.

Variable Headers

The Variable Headers area is just a set of headers. The term "variable" is used here because the number and type of headers may vary. They are used to prepare the environment and describe how to parse the data.

These headers consist of 2 parts:

  1. The header type, which gives information about the size and the type of the header ;
  2. The actual header.

The IGVM_VHS_VARIABLE_HEADER structure is the Header Type.

pub struct IGVM_VHS_VARIABLE_HEADER {
    pub typ: IgvmVariableHeaderType,
    pub length: u32,
}

Basically, the IgvmVariableHeaderType field is an enumeration that is divided into 4 sets:

  1. Invalid ;
  2. Platform headers ;
  3. Initialization headers ;
  4. Directive headers.

The Platform headers define on which platforms and VTL the image can be loaded. As for the Initialization headers, they are in charge of preparing the platform. For instance, one header might define the policy for allowing the guest debugging, or even defining its memory layout. The Directive headers are where the actual data is processed. In the case of openhcl.bin, it is possible to find various binaries such as a Linux kernel, a bootloader called openhcl_boot, a ramdisk and so on. More information can be found on the OpenHCL IGVM Image page.

Let's start analyzing some parts of the file.

The first set of headers is only used to define an invalid header and simply contains the value 0.

pub enum IgvmVariableHeaderType {
    INVALID = 0x0,
    // [...]
}

The second set, the Platform headers, also contains only one value. This value is used to describe the different supported isolation platforms such as TDX, SEV-SNP, and so on.

Although there is only 1 value in this set, the Platform headers range starts from 0x1 to 0x100 inclusive.

pub enum IgvmVariableHeaderType {
    // [...]
    IGVM_VHT_SUPPORTED_PLATFORM = 0x1,
    // [...]
}

Below, the Platform headers give the information that the IGVM file contains data for Hyper-V VM which supports VSM VTL-2 isolation.

SupportedPlatform(IGVM_VHS_SUPPORTED_PLATFORM { 
    compatibility_mask: 1,
    highest_vtl: 2,
    platform_type: VSM_ISOLATION,
    platform_version: 1,
    shared_gpa_boundary: 0
})

The third set, the Initialization headers, is also a range, spanning 0x101 to 0x200 inclusive. In practice, there are only 3 types of headers.

pub enum IgvmVariableHeaderType {
    // [...]
    IGVM_VHT_GUEST_POLICY = 0x101
    IGVM_VHT_RELOCATABLE_REGION = 0x102
    IGVM_VHT_PAGE_TABLE_RELOCATION_REGION = 0x103
    // [...]
}

This set is mainly used to prepare the guest partition. It defines loading context that will be used with Directive headers when needed.

The first Initialization header in openhcl.bin is a IGVM_VHT_RELOCATABLE_REGION.

RelocatableRegion { 
    compatibility_mask: 1, 
    relocation_alignment: 2097152, 
    relocation_region_gpa: 134217728, 
    relocation_region_size: 26832896,
    minimum_relocation_gpa: 134217728,
    maximum_relocation_gpa: 281474976710656,
    is_vtl2: true,
    apply_rip_offset: true,
    apply_gdtr_offset: true,
    vp_index: 0,
    vtl: Vtl2
}

It simply allows the loader to relocate the region. For instance, each PageData located between address 0x8000000 (134217728) and 0x9997000 (134217728 + 26832896 = 161050624) will be relocated if a relocation occurs. In addition, the offsets for the RIP and GDTR registers will also be updated.

Finally, the fourth and last set is for the Directive headers. Its range starts from 0x301 to 0x400 inclusive. These kinds of headers are used to describe the actual state of the VM to the loader. For instance, the IGVM file can describe the default values for the registers of a virtual processor. It can also set (page) data into the VM's memory.

pub enum IgvmVariableHeaderType {
    // [...]
    IGVM_VHT_PARAMETER_AREA = 0x301
    IGVM_VHT_PAGE_DATA = 0x302
    IGVM_VHT_PARAMETER_INSERT = 0x303
    IGVM_VHT_VP_CONTEXT = 0x304
    IGVM_VHT_REQUIRED_MEMORY = 0x305
    RESERVED_DO_NOT_USE = 0x306
    IGVM_VHT_VP_COUNT_PARAMETER = 0x307
    IGVM_VHT_SRAT = 0x308
    IGVM_VHT_MADT = 0x309
    IGVM_VHT_MMIO_RANGES = 0x30A
    IGVM_VHT_SNP_ID_BLOCK = 0x30B
    IGVM_VHT_MEMORY_MAP = 0x30C
    IGVM_VHT_ERROR_RANGE = 0x30D
    IGVM_VHT_COMMAND_LINE = 0x30E
    IGVM_VHT_SLIT = 0x30F
    IGVM_VHT_PPTT = 0x310
    IGVM_VHT_VBS_MEASUREMENT = 0x311
    IGVM_VHT_DEVICE_TREE = 0x312
    IGVM_VHT_ENVIRONMENT_INFO_PARAMETER = 0x313
}

As an example, let's take a header type of IGVM_VHT_PAGE_DATA. The actual header can be represented with the following structure:

pub struct IGVM_VHS_PAGE_DATA {
    pub gpa: u64,
    pub compatibility_mask: u32,
    pub file_offset: u32,
    pub flags: IgvmPageDataFlags,
    pub data_type: IgvmPageDataType,
    pub reserved: u16,
}

The gpa field, tells the loader at which Guest Physical Address this page must be mapped. The data of this page is located at file_offset in the IGVM file.

PageData {
    gpa: 1048576,
    compatibility_mask: 1,
    flags: IgvmPageDataFlags { 
        is_2mb_page: false,
        unmeasured: false,
        shared: false,
        reserved: 0
    },
    data_type: NORMAL,
    data: [
        ...,
        77, 90, ..., 80, 69, ...,
        46, 116, 101, 120, 116, ...,
        46, 100, 97, 116, 97, ...,
        ...
    ]
}

In the example above, not all the dumped data is shown, for readability. However, what can be seen is a good example of what can be found in page data. For instance, this is a PE file mapped in-memory.

77, 90 => MZ
80, 69 => PE
46, 116, 101, 120, 116 => .text
46, 100, 97, 116, 97 => .data

This might be a UEFI driver or application.

Finally, before concluding this section, let's add one last piece of information regarding Variable Headers.

The order of the various sets of headers is really important.

There can be 0 or several IgvmPlatformHeader, IgvmInitializationHeader or IgvmDirectiveHeader. However, headers are processed hierarchically. Reading a IgvmDirectiveHeader header invalidates readings of IgvmInitializationHeader and IgvmPlatformHeader headers. Similarly, reading a IgvmInitializationHeader invalidates readings of IgvmPlatformHeader headers.

Playing with Data

As said in introduction, Microsoft released a Rust library to manipulate this file format. In addition, C bindings are also available. Let's play with that.

First things first, the library must be installed in a Rust project in order to be used. It is published as a crate.

$ cargo add igvm

In fact, the igvm crate is used for logic/parsing, and igvm-defs crate contains the definitions and structures.

The main structure is IgvmFile. It exposes the associated function new_from_binary, which creates an IgvmFile object from an array of raw data. Moreover, once the object has been created, it offers various methods to easily access the headers and data:

Below is a minimal example showing how to open a file, convert it to IgvmFile, and how to access the different headers.

use std::{env, fs};
use igvm::{IgvmFile};

fn igvm_file_from_file(file_name: &String) -> Result<IgvmFile, igvm::Error> {
    let file_data = fs::read(file_name).expect(&format!("Cannot read file `{}`", file_name)); 
    let file_data: &[u8] = &file_data;
    IgvmFile::new_from_binary(file_data, None)
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let args: Vec<String> = env::args().collect();

    if args.len() < 2 {
        return Err(format!("Usage: {} <igvm_file>", args[0]).into());
    }

    let igvm = igvm_file_from_file(&args[1])?;

    if let Some(platform) = igvm.platforms().get(0) {
        println!("Platform 0: {:?}", platform);
    }

    if let Some(initialization) = igvm.initializations().get(0) {
        println!("Initialization 0: {:?}", initialization);
    }

    if let Some(directive) = igvm.directives().get(0) {
        println!("Directive 0: {:?}", directive);
    }

    Ok(())
}

Dumping & Unmapping PE files

As mentioned earlier while covering Directive Headers, PE files can be found inside some PageData. Using custom scripts or the IGVM library, we can locate them by looking for magic value such as MZ or PE. However, simply dumping a PE file from an IGVM won't be enough, as what is extracted is not an on-disk file but an in-memory mapped executable.

Properly extracting a PE file is left as an exercise for the reader — here are some useful resources:

Conclusion

This article introduced the IGVM file format using a concrete example from the OpenHCL project.

First, the general structure of this format was divided into 3 parts: the Fixed Header, the Variable Headers and the Data section.

Then, each part was analyzed with a focus on the main structures, examining the different types of headers and commenting on some of the values.

Finally, a bit of code introduced the microsoft/igvm Rust library, showing how to manipulate this format.

As a closing remark, we would add that although the IGVM format is not particularly complex, it serves as a good introduction to various cool topics such as Virtualization and Confidential Computing.

Acknowledgments

Thanks to all our Quarkslab colleagues for their proofreading, advice, and feedback.


If you would like to learn more about our security audits and explore how we can help you, get in touch with us!