Posted Thu 18 April 2024
Author Thiébaud Fuchs
Category Hardware
Tags USB, USB2, USB3, HydraUSB3, Facedancer, RISC-V, embedded, fuzzing, open-source, release, tool, 2024

In this blogpost, we present Hydradancer, a new board for Facedancer based on HydraUSB3 allowing faster USB peripherals emulation.

Hydradancer

USB (Universal Serial Bus) is the current standard for connecting peripherals to devices. USB is used to connect keyboards, mouses, printers, music instruments, storage, cameras and pretty much everything to a device. This makes it the perfect target for security researchers with physical access to a USB port.

While exchanging with USB peripherals can be done in Python with PyUSB¹ on any PC, creating custom USB peripherals for security assessment and testing (e.g. attack surface analysis, scanning, fuzzing) of USB hosts can be more challenging as it requires specific hardware. That's where Facedancer came in 12 years ago: Facedancer² is a Python library from Great Scott Gadgets that interacts with a dedicated hardware capable of creating USB devices, allowing you to create and modify a USB2 peripheral in seconds. However, the flexibility of Facedancer comes with a cost: data has to go from the target host to the controlling PC, then back to the target host using a much longer path than a regular USB device would use. The current implementation of Facedancer is based on backends, which support different hardwares: Facedancer21³/Raspdancer⁴/BeagleDancer⁵, GreatFET One ⁶ and the Moondancer backend for the upcoming Cynthion board⁷. While Moondancer should bring USB2 High-speed support (480Mb/s), Facedancer is currently stuck to USB2 Full-speed (1.5Mb/s) with instability issues.

With the open-source project Hydradancer, we bring a USB2 High-speed backend to Facedancer using the USB3 capabilities of HydraUSB3, a platform based on the RISC-V WCH569 chip. While emulating USB3 peripherals is still out of the question with the current delays, Hydradancer brings improved speeds and stability for USB2 peripheral emulation. As the WCH569 lacks documentation for USB3 and a proper SDK, a lot of testing was required to get the USB3 connection working and we will present the different challenges that we encountered while making wch-ch56x-lib, a support library for WCH569 with tested USB2/USB3/HSPI (High-speed Parallel Interface)/SerDes (Serializer/Deserializer) drivers.

While we initially started with a dual HydraUSB3 setup, a new board called Hydradancer, based on HydraUSB3 was created. It is easier to use and more reliable. We will present the differences between the two configurations and why we switched to this new version.

As we needed to measure the improvements of Hydradancer over existing backends, we will present our benchmarks that compare Hydradancer with the existing Facedancer21 and GreatFET One boards. Our results showed 607 times faster average read transfers for USB2 Full-speed transmission compared with Facedancer21 and 12 times faster compared with GreatFET One.

Hydradancer: a faster, USB2 High-Speed capable backend for Facedancer based on HydraUSB3

The current state of Facedancer

Facedancer principle

Facedancer principle

The Facedancer project was started in 2012 by Travis Goodspeed, the creator of the GoodFET⁸ multi-tool. GoodFET was already a USB interface for multiple protocols (JTAG, SPI, CAN, etc.) and Travis Goodspeed created a new board based on Goodfet that could be a USB interface for the USB MAX3421 chip: Facedancer. By connecting the board to your computer on one side and the target USB port on the other side, you can create various peripherals (a keyboard, mass storage, FTDI serial adapter, ...) by simply launching a Python script that uses a library also called Facedancer. Two other boards, Raspdancer and BeagleDancer, are also based on the USB MAX3421 chip but remove the external communication with Facedancer: Facedancer runs directly on the Raspberry Pi or Beagle Bone Black.

Facedancer21 and newer boards from Great Scott Gadgets

Facedancer21 and newer boards from Great Scott Gadgets

A few years later, GreatFET One⁶, the successor of GoodFET was created by Great Scott Gadgets, a company founded by Michael Ossmann that also makes the HackRF One Software Defined Radio peripheral. GreatFET One is based on the same principle as GoodFET: an extensible board that interfaces to a PC using USB. Great Scott Gadgets became the maintainer of the Facedancer Python library and made several improvements while adding support for the GreatFET One: move to Python3, API changes, support of new boards in the form of backends, integration of USBProxy directly in Facedancer.

Great Scott Gadgets is currently working on its next generation USB tool: the Cynthion⁷ board with the Luna gateware. Cynthion is a platform based on a FPGA, that aims at becoming a USB multi-tool: USB2 protocol sniffer, USB host/device emulation using Facedancer, a teaching platform for the USB protocol. The current release window is June 2024, but initial support has already been added to Facedancer in September 2023.

Facedancer is now at version 2.9 and supports both the creation of USB devices and hosts, along with a proxy mode that implements a Man-in-the-middle on USB communications between existing USB devices and hosts.

However, Facedancer is currently limited by the supported boards, as the following table shows.

Board	Maximum speed	Number of endpoints (not EP0)	Host mode
Facedancer21/Raspdancer	USB2 Full-speed	EP1 OUT, EP2 IN, EP3 IN	yes
GreatFET One	USB2 Full-speed	3 IN / 3 OUT	yes
Hydradancer	USB2 High-speed	5 IN / 5 OUT	no
(Cynthion/LUNA)(coming 2024)	(USB2 High-speed)	(15 IN / 15 OUT)	(yes)

Facedancer backends functionalities

Facedancer is currently limited to USB2 Full-speed and a very limited number of endpoints. Cynthion will probably bring a huge improvement to those capabilities but its performance will need to be evaluated once it is released.

HydraUSB3 and Hydradancer

Before presenting Hydradancer, let's first introduce the board on which it is based: HydraUSB3.

HydraUSB3⁹ is a development board created by Benjamin Vernoux around the WCH569 MCU. The WCH569 is a RISC-V single-core MCU that integrates various high-speed peripherals: USB3 Superspeed (5 Gbps), Gigabyte Ethernet, USB2 High-speed, HSPI (High-speed parallel interface), SerDes (Serializer/Deserializer). The presence of those high-speed peripherals makes it a good candidate for creating a faster Facedancer board, especially with USB3 support.

Two HydraUSB3 plugged together

Two HydraUSB3 plugged together

While a datasheet is provided by WCH in English (translated from Chinese) along with examples on a GitHub repository, using it in practice is painful: most functionalities are only presented as examples with loads of magic numbers (and no SDK), the USB3/SerDes examples use libraries in the form of binary blobs and the datasheet does not give any information to the developers for these protocols.

For those reasons, Benjamin Vernoux had to reverse-engineer the USB3 and SerDes implementation of the WCH569 to create an open-source implementation. He presented his work at the GreHack2022 cybersecurity conference in a talk "Reverse Engineering of advanced RISC-V MCU with USB3 & High Speed peripherals"¹⁰.

This allowed him to make a complete and clean SDK called wch-ch56x-bsp¹¹ for the WCH569 that served as the basis for making the Hydradancer peripheral drivers.

Hydradancer: overall architecture

Hydradancer¹² connects to the target host (for the case where we want to emulate USB devices) using one USB2 port that connects to the target host and a USB3 port that connects to the controlling PC running the Python script.

The firmware¹³ implements a passthrough for the USB protocol: whenever the board receives data from the target host, it is sent to the controlling PC through the other USB port. The Python script implementing the device then crafts a reply, sends it back to the board which sends it to the target host.

Before going into more details, let's first define some of the terms that we'll use in the rest of the blogpost.

When we started Hydradancer, we used two HydraUSB3⁹ boards connected using HSPI or SerDes. control board refers to the board connected to Facedancer using USB3 which effectively controls the second board, called the emulation board, which uses its USB2 controller to create the USB peripheral.

Hydradancer protocol loop for the dual HydraUSB3 configuration

Hydradancer protocol loop for the dual HydraUSB3 configuration

However, as you'll see later in this blogpost, we realized we could use a single modified HydraUSB3 by splitting the USB3 and USB2 controllers. We kept the control/emulation structure and naming, meaning control refers to the USB3 device (the one connected to Facedancer, controlling the communication) and emulation refers to the USB2 passthrough device/controller connected to the target host.

In both dual or single-board setups, the overall principle is the same and works as described in the following diagram.

Hydradancer overall principle

Hydradancer overall principle for the dual-HydraUSB3 configuration

Emulating a USB peripheral with the Hydradancer works like this:

Hydradancer connects to the side running Facedancer using a USB3 cable and to the target host using a USB2 cable.
When the USBDevice is created by Facedancer, the connect method of USBBaseDevice is called, which will initialize the backend.
The Hydradancer backend is initialized and the backend waits for the board to be ready by polling the control endpoint using the CHECK_HYDRADANCER_READY vendor request. This was implemented to let the boards reinitialize after a USB peripheral is disconnected (before connecting a new one).
Then, the connect method of the backend is called.

Each endpoint on the target USB port (managed by the emulation board) is mapped to an endpoint connected to the Facedancer host (control board endpoints). The WCH569 chip of HydraUSB3 can only handle 7 bidirectional endpoints independently at a time (not counting endpoint 0), but can handle all endpoint numbers from 1 to 15 for USB2. To avoid weird incompatibilities (like "you can use endpoint 4 but not while using endpoint 8 or endpoint 12"), we settled for using only endpoint numbers from 1 to 7 at the moment. For USB3, in the absence of more documentation from WCH, only 7 endpoints are supported (not counting endpoint 0). Since one endpoint is used for status/event polls, this leaves 6 endpoints on the control board to be used by the Facedancer peripheral, including one for the control endpoint (EP0). To allow using all endpoint numbers from 0 to 7 (and maybe more later), a mapping between control board endpoints and emulation board endpoints is set in the Facedancer backend and shared with the boards.

connect first creates a mapping for the control endpoint, as this endpoint is required. The backend then sends a SET_SPEED vendor control request to set the USB2 speed of the Hydradancer USB2 controller (low/full/high speed).

Finally, Hydradancer sends an ENABLE_USB_CONNECTION_REQUEST_CODE vendor control request to tell the firmware to enable the USB pull-up, which starts the USB communication.
The Hydradancer backend then starts polling the status of the emulation endpoints in service_irqs. This function is called in an infinite loop in the run function from USBBaseDevice, which is an async coroutine: it uses asyncio.sleep to let other coroutines execute. The status is a bitfield. For IN endpoints, 1 means the buffer is empty which means it is available. For OUT endpoints, 1 means the endpoint is full which means data is available on the corresponding mapped control endpoint. It serves as a synchronization variable between the control and emulation boards/controllers.

Polling directly on the mapped endpoints (for status or data) would have freed the status/event endpoint and make things more efficient but this was not feasible using libusb's synchronous API (the only one currently available in PyUSB): in the case where no data is available, each endpoint request will take 1 ms (the smallest libusb timeout) to complete. If only one endpoint is sharing data, it adds a 6-ms delay which would seriously limit transfer rate and reactivity.

Polling is done using control requests on EP0 before the device is configured, then using the EP1 BULK endpoint of the control board/controller. This mirrors the endpoint type used on the emulation board/controller, thus mirroring the bandwidth/timing requirements, which seemed to improve stability during the enumeration phase and improve data transfer rates after the enumeration. Ideally, we would also mirror the type of each data endpoint for the same reasons, but we only use bulk endpoints at the moment for simplicity.
After receiving a SET_CONFIGURATION request from the target host, the backend will send several SET_ENDPOINT_MAPPING vendor control requests to map the emulated board/controller endpoints to control endpoints.
At this point, both the emulation board/controller and control board/controller are configured, the target host has finished enumerating it and will start sending IN/OUT requests. Hydradancer handles IN and OUT requests in the following way:
- Initially, all IN endpoints are available (bit set to 1 in the status bitfield). If the target host sends an IN request and the buffer is empty, the firmware sends a NAK. The Facedancer device needs to prime the IN endpoints (meaning set an initial buffer) when it is ready to send data. The corresponding bit in the status bitfield is then set to 0 (meaning the device won't be able to send more data). When the target host has finished reading, the bit is set back to 1 and a status update is prepared on the control board EP1 so that the backend emulation endpoint state is updated. So currently, Hydradancer does not react to the host sending IN requests, but rather to the IN buffer being empty.
- All OUT endpoints have their bit set to 0 in the status bitfield initially. When data is received on an emulation endpoint, the bit is set to 1 and a status update is prepared on the control EP1 IN endpoint. While the status bit is 1, all following OUT requests from the target host will be NACKed. When the backend polls the endpoints status, it will then poll the corresponding mapped endpoint which returns data. After the backend has finished reading, the corresponding bit in the bitfield is set back to 0.
Punctual events like bus resets are also handled using the status bitfield, but the corresponding bit is cleared after being sent once (since it's a one-time event).

Dual-board setup

Each HydraUSB3 being able to handle only one USB peripheral (single USB port), two HydraUSB3 have been connected together through HSPI for this project.

A USB3 connection is used to interface with Facedancer, HSPI is used for the communication between the two HydraUSB3 boards. Using USB3 for the communication with Facedancer proved to be a requirement when emulating USB2 High-speed peripherals during the enumeration phase. However, USB2 High-speed seems to be sufficient to handle USB2 Full-speed.

Working with two HydraUSB3 boards connected through HSPI posed quite a lot of challenges, especially to get the timings right. One of the biggest issues initially was missing interrupts, something we fixed by deferring interrupts in user mode using a queue as shown in the diagram below.

Hydradancer sequence for an OUT and an IN transfer

Hydradancer sequence for an OUT and an IN transfer

But one issue remained with HSPI and the WCH569 chip: there is no way in the HSPI implementation to know when the receiving side has finished processing the previous request and is ready to process the next. The receiving HSPI controller will drive its HTACK/HTRDY line up to signal it is ready to receive data after the transmitting side asks for permission on the HTREQ line, however this can happen as soon as the previous buffer has been received, even during interrupts apparently. So if the interrupt handler is not fast enough, some buffers will simply be overwritten, even with double-buffering. It could be interesting to dive more into this, maybe this happens only in double-buffering mode, where the current HSPI buffer would keep switching even during interrupts, thus overwriting buffers. But in any case, using HSPI on the WCH569 proved to be a headache when increasing the number of exchanges with the dual HydraUSB3 setup.

The only solution we found for this was to detect consecutive sends in the task queue of the sender and add an artificial delay to prevent missing communications, which is not a clean solution.

Single-board setup: the way forward

About six months after the start of the Hydradancer project, we randomly talked about how the USB2 and USB3 hardware of the WCH569 are physically separate. This prompted us to check if we could indeed use both USB2 and USB3 separately: USB3 should always be retro-compatible with USB2 and we were focused on making HSPI/SerDes work for the dual-board setup, so it did not occur to us that this could be done.

Some additional work had to be done to completely separate the USB3 and USB2 parts of the library, as both WCH demo code and our library were built to support USB3 with USB2 downgrade (meaning one was deactivated while the other was working).

But in the end, we were able to make a proof-of-concept by creating one USB3 and one USB2 loopback device simultaneously on the same (modified) HydraUSB3 board and run the tests successfully!

Hydradancer prototype board

Hydradancer prototype board, derived from HydraUSB3. The USB-C below the board is USB2-only (emulation side, connected to target host) and the USB3 connector has no USB2 lines (connected to Facedancer host).

Using a USB3 connector with no USB2 differential pair does not seem to be an issue: all USB3 hosts will start establishing a USB3 link connection and will only activate their USB2 controller if the USB3 fails. While this is not standard, we don't see any way a host would reject our USB3 peripheral.

After proving this would work properly, we implemented the firmware supporting the Hydradancer backend for the single-board setup.

Being able to use both USB3 and USB2 on the same WCH569 chip has huge advantages: we don't need to copy buffers and transmit them through an external protocol (HSPI/SerDes) with all the timing issues and delays, the buffers just stay at the same place in memory (zero copy).

Hydradancer protocol loop for the Hydradancer dongle

Hydradancer protocol loop for the Hydradancer dongle

Moving from a dual-board setup to a single-board one vastly improved the results of our loopback/speed tests, the stability of the Facedancer backend and ease of code maintenance.

Using Hydradancer

To use Hydradancer, you need either two HydraUSB3 or a Hydradancer board (recommended), along with one USB3 cable and one USB2 cable.

Then, you'll need to flash the required firmwares as described on GitHub¹³, depending on the setup (dual HydraUSB3 boards or single Hydradancer board).

Finally, while we hope to merge the Hydradancer backend for Facedancer into the main repository² along with some bug fixes we may have found, you can use our fork¹⁴ in the meantime.

First, clone the Facedancer fork

git clone https://github.com/HydraDancer/Facedancer

Then, reuse your virtual env or create a new one to keep your local Python installation clean

sudo apt install python3 python3-venv
python3 -m venv venv

Activate the venv

source venv/bin/activate

Install Facedancer

cd Facedancer
pip install --editable .

The --editable isn't necessary but it allows you to modify Facedancer's files.

Then, tell Facedancer to use the Hydradancer backend

export BACKEND=hydradancer

And finally, run one of the examples to check if everything works, this one should make your cursor wiggle.

python3 ./examples/crazy-mouse.py

Results: benchmark against Facedancer21 and GreatFET One

	Write average estimate	Relative write uncertainty	Write transfer size	Read average estimate	Relative read uncertainty	Read transfer size	Confidence
Hydradancer High-speed	7996.352±314.348 KB/s	4%	499.712 KB	4224.192±157.058 KB/s	4%	499.712 KB	99.9%
Hydradancer Full-speed	747.295±20.899 KB/s	3%	49.984 KB	414.188±7.368 KB/s	2%	49.984 KB	99.9%
GreatFET One Full-speed (multiple single-packet transfers)	32.422±0.844 KB/s	3%	49.959 KB	33.066±1.095 KB/s	3%	49.984 KB	99.9%
Facedancer21 Full-speed	0.697±0.0 KB/s	0%	9.984 KB	0.682±0.0 KB/s	0%	9.984 KB	99.9%

Speedtest results

All benchmarks were conducted using a single libusb transfer, except for GreatFET One. A single USB transfer equals a single call to libusb: libusb takes the responsibility of sending the packets as fast as possible. While running our test for GreatFET One, we ran into an issue that prevented us from doing a single transfer: GreatFET One just would not accept packets of 64 bytes (the full packet size for USB2 full-speed) so we had to settle for packets of 63 bytes and sending with individual transfers. However, this should not matter that much for speedtesting Facedancer: there is a lot of downtime with all the transfers from one side to the other, so libusb can't send the packets too fast either.

Note that speedtests are not everything. While GreatFET One has proven mostly reliable, Facedancer21 was a pain to get working with scripts being launched more than ten times before the board starts working. We have found Hydradancer to be reliable during our tests, especially the single-board setup.

Field-tested drivers for the WCH569

During this project, we developed a high-level library wch-ch56x-lib¹⁵ based on wch-ch56x-bsp¹¹, with improved peripherals and testing.

This library includes:

USB2/USB3 drivers with a shared USB abstraction layer
HSPI (bidirectional half-duplex): two versions are implemented, one handles data directly in the interrupt handler, the other uses the interrupt queue to defer processing
SerDes (simplex)
memory pool: a RAMX (the memory used by the peripherals) pool that allows swapping peripheral buffers while keeping previous buffers for deferred processing using the interrupt queue. It also avoids unnecessary copies and uses reference counting
interrupt_queue: a simple task queue to defer processing in user mode, so that it can be interrupted and fewer interrupts might be missed
logging: different loggers are implemented, mainly direct logging through UART1 and logging to a ringbuffer. Logging has a noticeable impact on performance and can create new bugs when trying to debug the high-speed peripherals like USB3. Logging to a ringbuffer and flushing to UART1 later can help, but even then logging might need to be kept to a minimum. Log levels and categories have been set up to easily activate the logs of different parts of the library

Various tests were implemented for the wch-ch56x-lib library, mainly loopback and speed tests, with Python and C host programs to support them.

Testing was a huge part of this project, as we often reached the limitations of WCH's examples and documentation, for instance:

USB3 out control requests were not working and we actually had to manually inline the code to make them work (the USB3 part of the firmware is really sensitive on timings)
USB3 did not support packets of size less than the maximum packet-size, we also encountered issues with how the examples dealt with bursts
we had to test if HSPI could work in half-duplex on both sides simultaneously
timing issues with HSPI: we could not prevent the sender from overriding the receiving buffer while processing it in an interrupt (although the HSPI protocol supports such signals)

We relied on logs to reverse some of the WCH569 functionalities, for instance to find the right usage for the USB3 control registers when handling bursts. The WCH-LinkE did not work properly for us, even with the MoonRiver IDE.

How to get the Hydradancer board

If you are interested by this project, we recommend buying the new Hydradancer board when it is available on the Hydrabus website, it will be announced on Hydrabus's Twitter/X account. In this blogpost, we presented the prototype used for development but Benjamin Vernoux has launched the production of a first batch of HydraDancer Dongle V1 R0, which will be much smaller. This first batch will be tested before launching a second batch that will be made available.

HydraDancer Dongle V1 R0

HydraDancer Dongle V1 R0

This new Hydradancer can also be used to create USB3 peripherals, although without USB2 downgrade contrary to a HydraUSB3.

If you encounter any bugs or missing features (like the currently unimplemented host-mode), don't hesitate to create an issue on GitHub repository of the Hydradancer firmware¹³.

Conclusion

In this blogpost, we presented Hydradancer, a new backend and board for Facedancer that supports USB2 High-speed and allows faster data-transfer rates overall using USB3.

This project would not have been possible without the support of Benjamin Vernoux, the creator of the HydraUSB3 and Hydradancer hardware. I would also like to thank Philippe Teuwen (doegox) and Mengsi Wu from Quarkslab for their help and support during this project.

Sources

https://github.com/pyusb/pyusb ↩
https://github.com/greatscottgadgets/Facedancer ↩↩
https://goodfet.sourceforge.net/hardware/facedancer21/ ↩
https://wiki.yobi.be/index.php/Raspdancer ↩
https://github.com/dominicgs/BeagleDancer ↩
https://greatscottgadgets.com/greatfet/one/ ↩↩
https://greatscottgadgets.com/cynthion/ ↩↩
https://goodfet.sourceforge.net/ ↩
https://hydrabus.com/hydrausb3-v1-0-specifications ↩↩
https://github.com/hydrausb3/grehack22 ↩
https://github.com/hydrausb3/wch-ch56x-bsp ↩↩
https://hydradancer.com ↩
https://github.com/HydraDancer/hydradancer_fw ↩↩↩
https://github.com/HydraDancer/Facedancer ↩
https://github.com/hydrausb3/wch-ch56x-lib ↩

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

Table of contents

Hydradancer: Faster USB Emulation for Facedancer