Clang Hardening Cheat Sheet

While improving the documentation (d'oh!) of our home grew obfuscator based on LLVM, we wrote a cheat sheet on clang's hardening features, and some of ld ones. It turns out existing hardening guides generally focus on GCC, while Clang also has an interesting set of hardening features. So let's share it in this blog post!

Note0: Everything in this post is based on Clang/LLVM 3.7

Note1: Debian provides a very interesting hardening guide here: https://wiki.debian.org/Hardening

Note2: This post does not cover the use of Asan. Unlike the options presented here, it's unlikely to go into release build, rather in debug builds.

Basics

debug:Obviously, do not pass the -g flag to the compiler, or if you forget to (!) pass the -Wl,--strip-debug flag to the linker.
strip:Either call strip on the final binary, or pass the -Wl,--strip-all flag to the linker to strip all symbols.

Fortify

Pass the -D_FORTIFY_SOURCE=2 flag to the preprocessor to add extra checks that may, for instance, detect some buffer overflows.

Consider fortify_example.c:

#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
    char buffer[8];
    strcpy(buffer, argv[0]);
    puts(buffer);
    return 0;
}

Compiled with clang -O2 fortify_example.c -o fortify_example and run, it outputs:

> ./fortify_example
./fortify_example
Segmentation fault

But when compiled with clang -O2 fortify_example.c -D_FORTIFY_SOURCE=2 -o fortify_example, we get:

> ./fortify_example
*** buffer overflow detected ***: ./fortify_example terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7320f)[0x7f630b4a120f]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f630b5244e7]
/lib/x86_64-linux-gnu/libc.so.6(+0xf4700)[0x7f630b522700]
./fortify_example[0x4005ba]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f630b44fb45]
./fortify_example[0x4004c9]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fb:01 391109                             /tmp/fortify_example
[... blah blah blah ...]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Internally, -D_FORTIFY_SOURCE=2 replaces the call to strcpy by a call to __strcpy_chk that takes a third parameter which is the size of the buffer.

Fun fact: the actual size of the buffer is computed through a call to llvm.objectsize.*.* (see http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic) that is lowered to the object size value if it can be computed by the compiler.

For instance, for the (accordingly tricky) code trick_fortify_example.c:

#include <stdio.h>
#include <string.h>
extern char* gimme_some_mem();
int main(int argc, char *argv[]) {
    char* buffer = gimme_some_mem();
    strcpy(buffer, argv[0]);
    puts(buffer);
    return 0;
}

The size of buffer cannot be computed at compile time (if Link Time Optimisation is not available), thus fortify fails.

The cost of this protection is the extra length checks that are made, but it's likely to be negligible in most cases.

Address Space Layout Randomization

All recent kernel versions support ASLR, a feature that makes the program address space unpredictable. It makes exploitation of some kinds of vulnerability harder as addresses can't be hard-written but need to be computed and/or guessed. But it can only be used if your code is position independent or relocatable, i.e. if it has been compiled with -fPIE -pie for binaries, and -fPIC for your shared libraries.

The following dummy code from aslr.c:

#include <stdio.h>

int main() {
  printf("%p\n", main);
  return 0;
}

when compiled with clang aslr.c -o aslr -O2 and run successively, always outputs the same value:

> ./aslr
0x400530
> ./aslr
0x400530

but when compiled with clang aslr.c -o aslr -O2 -fpie -pie, it outputs random values thanks to ASLR:

> ./aslr
0x564cde10d7a0
> ./aslr
0x55daab4d37a0

Stack Protection

-fstack-protector

Pass the -fstack-protector flag to the compiler to force the addition of a stack canary that checks for stack smashing. One can use -fstack-protector-strong to include more functions that could be subject to stack smashing, and -fstack-protector-all to include all functions (but it's more expensive in terms of code size and execution time).

For instance, running clang -O2 -fstack-protector fortify_example.c -S -masm=intel -o - on the fortify_example.c outputs some unusual assembly:

    mov        rax, qword ptr fs:[40]
    mov        qword ptr [rsp + 8], rax
    mov        rsi, qword ptr [rsi]
    lea        rbx, [rsp]
    mov        rdi, rbx
    call        strcpy
    mov        rdi, rbx
    call        puts
    mov        rax, qword ptr fs:[40]
    cmp        rax, qword ptr [rsp + 8]
    jne        .LBB0_2
# BB#1:                                 # %SP_return
    xor        eax, eax
    add        rsp, 16
    pop        rbx
    ret
.LBB0_2:                                # %CallStackCheckFailBlk
    call        __stack_chk_fail

We see an interesting call to __stack_chk_fail, triggered if the stack canary check cmp rax, qword ptr [rsp + 8] fails.

Given that the canary was originally stored by mov rax, qword ptr fs:[40] followed by mov qword ptr [rsp + 8], rax, if the stash is smashed, [rsp + 8] can be overwritten by a value different from the value loaded from the dedicated location fs:[40].

The additional performance cost involves the additional check that is performed at the end of the function. It can become an issue for a frequently called one, especially if the execution time of the function is very small.

-fsanitize=safe-stack

Safe Stack introduces an additional stack, separated from the unsafe stack, that stores return addresses and other pieces of data that may be subject to an attack. As a consequence, buffer overflow from the unsafe stack cannot smash the safe stack.

Running clang -O2 -fsanitize=safe-stack fortify_example.c -S -masm=intel -o - on the fortify_example.c outputs:

main:                                   # @main
    .cfi_startproc
    push        r15
.Ltmp0:
    .cfi_def_cfa_offset 16
    push        r14
.Ltmp1:
    .cfi_def_cfa_offset 24
    push        rbx
.Ltmp2:
    .cfi_def_cfa_offset 32
.Ltmp3:
    .cfi_offset rbx, -32
.Ltmp4:
    .cfi_offset r14, -24
.Ltmp5:
    .cfi_offset r15, -16
    mov        r14, qword ptr [rip + __safestack_unsafe_stack_ptr@GOTTPOFF]
    mov        rbx, qword ptr fs:[r14]
    lea        rax, [rbx - 16]
    mov        qword ptr fs:[r14], rax
    lea        r15, [rbx - 8]
    mov        rsi, qword ptr [rsi]
    mov        rdi, r15
    call        strcpy
    mov        rdi, r15
    call        puts
    mov        qword ptr fs:[r14], rbx
    xor        eax, eax
    pop        rbx
    pop        r14
    pop        r15
    ret

This code manages two stacks, the regular stack that is considered as safe, and an extra (unsafe) stack, that ends at rbx = fs:[r14]. All unsafe operations, such as the call to strcpy, are performed using the unsafe stack: rdi = r15 = [rbx - 8].

It has less impact on performances than the stack canary technique, as no check is performed at the end of the protected functions. This protects against a "trivial" rewrite of the return address present on the stack (as this address stays in the safe stack), but might not prevent the buffer overflow to overwrite something else. This depends on the memory layout of the running software.

Unlike previous technique, the security provided by this solution does not rely on the entropy source.

Control Flow Integrity (CFI)

Pass the -fsanitize=cfi flag to the compiler to add checks that the path taken by the program execution has not been subverted by an attacker when an indirect call is performed. This requires Link Time Optimization, and so the gold plugin, as activated by -fuse-ld=gold -flto. See http://clang.llvm.org/docs/ControlFlowIntegrity.html for more info!

For instance, the example cfi_example.cpp below shows how virtual function calls are protected by this method:

#include <iostream>
#include <string>
#include <memory>

struct Op
{
  virtual ~Op() { }
  virtual int f(int a, int b) const = 0;
};

struct Add: public Op
{
  virtual int f(int a, int b) const override { return a+b; }
};

struct Mul: public Op
{
  virtual int f(int a, int b) const override { return a*b; }
};

__attribute__((noinline)) int call_op(Op const& op, int a, int b)
{
  return op.f(a,b);
}

int main(int argc, char** argv)
{
  if (argc < 4) {
    std::cerr << "Usage: " << argv[0] << " op a b" << std::endl;
    return 1;
  }
  std::unique_ptr<Op> Op;
  switch (std::stol(argv[1])) {
    case 0:
      Op = std::make_unique<Add>();
      break;
    case 1:
      Op = std::make_unique<Mul>();
      break;
    default:
      return 1;
  };
  std::cout << call_op(*O, std::stol(argv[2]), std::stol(argv[3])) << std::endl;
  return 0;
}

Compiling this code without CFI gives, for the call_op function:

; int __fastcall call_op(const Op* op, int a, int b)
op = rdi                ; const Op *
mov     rax, [op]
mov     rax, [rax+10h]
jmp     rax

The virtual call is done without any prior checks. That is, if any issue in the software lets an attacker to control the value stored at the address given by op, he could take control of the software execution flow.

Compiled with CFI enabled (clang++ -O2 -std=c++14 -flto -fsanitize=cfi -fuse-ld=gold cfi_example.cpp -o cfi_example), the call_op function is now this one:

; int __fastcall call_op(const Op* op, int a, int b)
op = rdi                                ; const Op *
  mov     rcx, [op]     ; rcx = pointer to the vtable of 'op'
  mov     r8d, offset vtableadd_funcs
  mov     rax, rcx
  sub     rax, r8
  rol     rax, 3Ah
  cmp     rax, 2
  jnb     short loc_400DEF
  mov     rax, [rcx+10h]
  jmp     rax
loc_400DEF:
  ud2

What's basically happening here is that it checks that the pointer to the vtable of the given op object is the one of the Add or Mul class. Indeed, vtableadd_funcs points to these data:

; `vtable for'Add
_ZTV3Add        dq 0
                dq offset _ZTI3Add      ; `typeinfo for'Add
vtableadd_funcs dq offset _ZN2OpD2Ev    ; Op::~Op()
                dq offset _ZN3AddD0Ev   ; Add::~Add()
                dq offset _ZNK3Add1fEii ; Add::f(int,int)
                dq 0
                dq 0
                dq 0
; `vtable for'Mul
_ZTV3Mul        dq 0
                dq offset _ZTI3Mul      ; `typeinfo for'Mul
off_4011C0      dq offset _ZN2OpD2Ev    ; Op::~Op()
                dq offset _ZN3MulD0Ev   ; Mul::~Mul()
                dq offset _ZNK3Mul1fEii ; Mul::f(int,int)

So, the vtable pointer of the op object is subtracted to the one of add, and then we check whether we are in a possible range or not. If this is not the case, we jump to an undefined instruction that makes the process crash.

As a final note, beware that the Clang CFI only works with C++ objects. It won't protect for instance indirect call done with a more classical function pointer.

GOT protection

read-only relocations:
Passing the -Wl,-z,relro flag to the linker marks some section read only, which prevents some GOT overwrite attacks.
immediate binding:
If the -Wl,-z,now flag is passed to the linker, all symbols are resolved at load time. Combined with the previous flag, this prevents more GOT overwrite attacks (otherwise part of the GOT is updated at runtime, and that part is not marked as read-only by -Wl,-z,relro

Let's compile the following code hello.c with clang hello.c -o hello and clang hello.c -o hello.relro -Wl,-z,now -Wl,-z,relro:

#include <stdio.h>

int main() {
  puts("degemer mat");
  return 0;
}

And observe the output of readelf -rl on both:

> ./readelf -rl hello

Elf file type is EXEC (Executable file)
Entry point 0x400430
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001c0 0x00000000000001c0  R E    8
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000006f4 0x00000000000006f4  R E    200000
  LOAD           0x00000000000006f8 0x00000000006006f8 0x00000000006006f8
                 0x0000000000000240 0x0000000000000248  RW     200000
  DYNAMIC        0x0000000000000710 0x0000000000600710 0x0000000000600710
                 0x00000000000001e0 0x00000000000001e0  RW     8
  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x00000000000005d0 0x00000000004005d0 0x00000000004005d0
                 0x0000000000000034 0x0000000000000034  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10

[...]

Relocation section '.rela.dyn' at offset 0x370 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000006008f0  000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

Relocation section '.rela.plt' at offset 0x388 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600910  000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
000000600918  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
000000600920  000300000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0

The puts symbol is loaded at offset 000000600910 right into the DYNAMIC segment, which is in read write mode.

But when compiled with read-only relocations and immediate loading, we get:

> ./readelf -rl hello.relro

Elf file type is EXEC (Executable file)
Entry point 0x400470
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000734 0x0000000000000734  R E    200000
  LOAD           0x0000000000000db0 0x0000000000600db0 0x0000000000600db0
                 0x0000000000000260 0x0000000000000268  RW     200000
  DYNAMIC        0x0000000000000dc8 0x0000000000600dc8 0x0000000000600dc8
                 0x0000000000000200 0x0000000000000200  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000000000610 0x0000000000400610 0x0000000000400610
                 0x0000000000000034 0x0000000000000034  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x0000000000000db0 0x0000000000600db0 0x0000000000600db0
                 0x0000000000000250 0x0000000000000250  R      1

[...]

Relocation section '.rela.dyn' at offset 0x3a8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600ff8  000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

Relocation section '.rela.plt' at offset 0x3c0 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600fe0  000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
000000600fe8  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
000000600ff0  000300000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0

It turns out that puts is loaded in the GNU_RELRO segment marked as read only \o/.

This technique increases the binary loading time, as all the symbols are resolved upon loading and not on demand.

Detect Potential Formatting Attack

Pass the -Wformat -Wformat-security -Werror=format-security flags to the compiler to get a warning on potential formatting attack. -Wformat-security is activated by default, as showcased by the compilation of the following code format.c with clang format.c -o format:

#include <stdio.h>


int main(int argc, char* argv[]) {

  printf(argv[1]);

  return 0;

}

which outputs:

> clang format.c -o format
format.c:4:10: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
  printf(argv[1]);
          ^~~~~~~
1 warning generated.

Conclusion

In place of a conclusion, a warning: hardening is only one piece of the armor, don't join the battlefield naked whith only a helmet on the top of your head :-)

Comments