LLVM developer Meeting report

So we spent a few days in California at the LLVM developer meeting, our first trip to the Silicon Valley! We had two talks there, a beginner tutorial to learn how to build out-of-tree passes, using simple obfuscation passes as the teaching material. The sources for the tutorial are hosted on github. Serge also presented an humoristic lightning talk listing a few of the limits of LLVM we bumped into while developing an LLVM obfuscator. The challenge was to make the audience laugh (and not only because of our French accent) and it was a success ;-)

Apart from Quarkslab tiny contribution, there was a lot of exciting presentations. The jet lag made it hard to follow everything, but here is the battle report.

Day 1

WebAssembly: Here Be Dragons

A funny RPG-like talk, one of the speaker had a Magic the Gathering T-Shirt \o/. Read this slightly old announcement on Web assembly to learn more, but in a nutshell it's an assembly language for the web that aims at speed, security and portability, while being capable of doing anything native code can. The talk was rather high-level, no big focus on security. Funny enough, their Internal Representation (IR) is an AST, to be able to easily generate Javascript as a backend (!). Current front-end is clang for C/C++ but Python, Rust, JIT compilation are envisioned too.

Loved this quote from Bjarne Stroustrup: "There are only two kinds of languages: the ones people complain about and the ones nobody uses." Javascript will survive!

Input Space Splitting for OpenCL

OpenCL is a language embedded in C/C++ to target accelerators, exposing a grid of independent workers on a grid of workgroups, using an hybrid SPMD/SIMD model. The author of the talk focus on convolution kernels where part of the iteration must receive different treatment from the others (typically the borders require a different processing for convolutions). They elaborate on guarding the iteration space to focus on the regular parts, backed up with the polyhedral model.

Automated Performance Tracking of LLVM Generated Code

The talk focuses on Continuous Integration, with a focus on performance regression tracking rather than just passing the test suite. It showcases many issues (noise, false positive, false negative). Funny enough, they talk about ASLR as a means to randomize the runs ;-) One of the conclusion is that a single number cannot be used to measure performance, at least use averages coupled with confidence intervals! And that monitoring the compilation time on a production buildbot is too noisy. They also record the hash of each generated binaries to get the stats on the number of test cases changed by a commit.

See http://llvm.org/docs/lnt for more info!

Living Downstream Without Drowning

A very interesting talk by a Sony team on tips to maintain downstream patched LLVM versions. Feel the pain of the dev that first spent three months to go from LLVM 2.9 to LLVM 3.0!

Lessons learnt:

  • try to merge (automatically!) upstream commits ASAP, not waiting for a release (there are ~50 commits per day on LLVM+Clang projects)
  • keep bug references in downstream patches to help merge
  • never delete upstream code (use #if 0 instead)
  • never reformat upstream code

They use an interesting staged build:

  1. Release no assert
  2. Release assert
  3. Debug assert
  4. Debug assert on all arch

This helps detecting bugs as soon as possible.

Beyond Sanitizers: guided fuzzing and security hardening

This talk was about multiple things:

  • Introduction to libfuzzing
  • Control Flow Integrity thanks to clang/llvm
  • Stack-based buffer overflow guards

libFuzzing seemed interesting (especially after a series of posts on llvm-dev about fuzzing Postgresql) but we never found the time to test it. This introduction talk showed the basics of how to use it with real life projects and was very concise and clear! The main idea is to implement a function that gets data as input, and use that data with the API we'd like to fuzz. By using existing clang-based tools as ASAN (Address Sanitizer), MemorySanitizer and others, it helps find (exploitable) bugs in a pretty effective manner. One interesting thing is that you can feed the fuzzer with initial data, and it will then be able to produce new ones in order to maximize code coverage. The author then showed how Heartbleed (http://heartbleed.com/) was found in about 5 seconds using libFuzzing (really impressive!).

We will perhaps write a small post on Quarkslab's blog on how to use it for one of our library #teasing :)-

As a second part, the author talked about an implementation of CFI (Control Flow Integrity) inside clang/llvm. The main idea is to protect every indirect call/jump with tables of valid pointers (including C++ vtable calls). Some optimisations on how the tables were stored and accessed were presented. This obviously won't work across dynamic shared objects (like an application with a DLL-based plugin system). The author claimed less than 1% CPU overhead and 7% code size increase within the Chromium project, which is a really nice result! Note that this feature won't prevent ROP chains from being executed, but will indeed make it harder to bootstrap them (that is execute the first gadget of the chain).

Finally, a stack-based buffer overflow protection (SafeStack) was presented. It looked a lot like GCC's canary mechanism, but maybe we missed something during the presentation.

As a conclusion, this talk was one of the closest of what we are doing here at Quarkslab, and was thus really interesting for us! :)

Day 2

Swift's High-Level IR: A Case Study of Complementing LLVM IR with Language-Specific Optimization

One of my preferred talk of this meeting, as many principles mentioned here echoed to my personal project pythran. First a quote that made me laugh (cannot remember the author though) "in C++ there is a nice language trying to emerge."

The talk started with a statement: the textbook 3 phase compiler infrastructure got messed up by reality (C++ difficult to parse, analyses implemented both at the LLVM and clang-analyze level etc). The concept is that there is a wide abstraction gap between high level languages and LLVM IR. So they introduced Swift Intermediate Language (SIL) to be able to perform high-level analyses and optimisations before generating LLVM bytecode that takes care of the lower level stuff.

A few funny things with the Swift language: by default, addition throws when overflowing, a dedicated operator, namely &+ has an implicit wrap around semantic.

A nice high level analysis: by default, variable captured by a lambda are allocated on the heap, then successive refinements can happen depending on the variable accesses, leading to variables allocated on the stack, eventually captured by value instead of reference if meaningful. <3 the approach.

Exception handling in LLVM, from Itanium to MSVC

The first part of this talk was great, going through the first implementation of exceptions in CFront based on linked-list + setjmp/longjmp followed by a more efficient one using Call Frame Information (CFI) and stack unwinding, then the Itanium style with successive landing pad and unwinding, an ABI that got followed by GCC then LLVM. And everything changes with Windows, but Serge got lost starting from that part :-/

An update on Clang-based C++ Tooling

To keep it short, use:

They all are wonderful clang-based tools for the C++ programmer!

At that point Serge made the decision to turn its lightning-but-serious talk on obfuscation into a let-me-entertain-you-for-five-minutes talk, so he got less focused on his notes, we actually cannot make sense out of the remaining ones :-/


What a great moment! In addition to meeting former teammates, discovering the community, attending great talks, we were both really impressed by the amount of engineering done there. Well, that's what it means to be at the Silicon Valley!

And believe us or not, there was some green Chartreuse at the social event, the perfect match after an American burger!