While improving the documentation (d'oh!) of our home grew obfuscator based on LLVM, we wrote a cheat sheet on clang's hardening features, and some of ld ones. It turns out existing hardening guides generally focus on GCC, while Clang also has an interesting set of hardening features. So let's share it in this blog post!
Note0: Everything in this post is based on Clang/LLVM 3.7
Note1: Debian provides a very interesting hardening guide here: https://wiki.debian.org/Hardening
Note2: This post does not cover the use of Asan. Unlike the options presented here, it's unlikely to go into release build, rather in debug builds.
Basics
debug: | Obviously, do not pass the -g flag to the compiler, or if you forget to (!) pass the -Wl,--strip-debug flag to the linker. |
---|---|
strip: | Either call strip on the final binary, or pass the -Wl,--strip-all flag to the linker to strip all symbols. |
Fortify
Pass the -D_FORTIFY_SOURCE=2 flag to the preprocessor to add extra checks that may, for instance, detect some buffer overflows.
Consider fortify_example.c:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
char buffer[8];
strcpy(buffer, argv[0]);
puts(buffer);
return 0;
}
Compiled with clang -O2 fortify_example.c -o fortify_example and run, it outputs:
> ./fortify_example ./fortify_example Segmentation fault
But when compiled with clang -O2 fortify_example.c -D_FORTIFY_SOURCE=2 -o fortify_example, we get:
> ./fortify_example *** buffer overflow detected ***: ./fortify_example terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x7320f)[0x7f630b4a120f] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f630b5244e7] /lib/x86_64-linux-gnu/libc.so.6(+0xf4700)[0x7f630b522700] ./fortify_example[0x4005ba] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f630b44fb45] ./fortify_example[0x4004c9] ======= Memory map: ======== 00400000-00401000 r-xp 00000000 fb:01 391109 /tmp/fortify_example [... blah blah blah ...] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Internally, -D_FORTIFY_SOURCE=2 replaces the call to strcpy by a call to __strcpy_chk that takes a third parameter which is the size of the buffer.
Fun fact: the actual size of the buffer is computed through a call to llvm.objectsize.*.* (see http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic) that is lowered to the object size value if it can be computed by the compiler.
For instance, for the (accordingly tricky) code trick_fortify_example.c:
#include <stdio.h>
#include <string.h>
extern char* gimme_some_mem();
int main(int argc, char *argv[]) {
char* buffer = gimme_some_mem();
strcpy(buffer, argv[0]);
puts(buffer);
return 0;
}
The size of buffer cannot be computed at compile time (if Link Time Optimisation is not available), thus fortify fails.
The cost of this protection is the extra length checks that are made, but it's likely to be negligible in most cases.
Address Space Layout Randomization
All recent kernel versions support ASLR, a feature that makes the program address space unpredictable. It makes exploitation of some kinds of vulnerability harder as addresses can't be hard-written but need to be computed and/or guessed. But it can only be used if your code is position independent or relocatable, i.e. if it has been compiled with -fPIE -pie for binaries, and -fPIC for your shared libraries.
The following dummy code from aslr.c:
#include <stdio.h>
int main() {
printf("%p\n", main);
return 0;
}
when compiled with clang aslr.c -o aslr -O2 and run successively, always outputs the same value:
> ./aslr 0x400530 > ./aslr 0x400530
but when compiled with clang aslr.c -o aslr -O2 -fpie -pie, it outputs random values thanks to ASLR:
> ./aslr 0x564cde10d7a0 > ./aslr 0x55daab4d37a0
Stack Protection
-fstack-protector
Pass the -fstack-protector flag to the compiler to force the addition of a stack canary that checks for stack smashing. One can use -fstack-protector-strong to include more functions that could be subject to stack smashing, and -fstack-protector-all to include all functions (but it's more expensive in terms of code size and execution time).
For instance, running clang -O2 -fstack-protector fortify_example.c -S -masm=intel -o - on the fortify_example.c outputs some unusual assembly:
mov rax, qword ptr fs:[40]
mov qword ptr [rsp + 8], rax
mov rsi, qword ptr [rsi]
lea rbx, [rsp]
mov rdi, rbx
call strcpy
mov rdi, rbx
call puts
mov rax, qword ptr fs:[40]
cmp rax, qword ptr [rsp + 8]
jne .LBB0_2
# BB#1: # %SP_return
xor eax, eax
add rsp, 16
pop rbx
ret
.LBB0_2: # %CallStackCheckFailBlk
call __stack_chk_fail
We see an interesting call to __stack_chk_fail, triggered if the stack canary check cmp rax, qword ptr [rsp + 8] fails.
Given that the canary was originally stored by mov rax, qword ptr fs:[40] followed by mov qword ptr [rsp + 8], rax, if the stash is smashed, [rsp + 8] can be overwritten by a value different from the value loaded from the dedicated location fs:[40].
The additional performance cost involves the additional check that is performed at the end of the function. It can become an issue for a frequently called one, especially if the execution time of the function is very small.
-fsanitize=safe-stack
Safe Stack introduces an additional stack, separated from the unsafe stack, that stores return addresses and other pieces of data that may be subject to an attack. As a consequence, buffer overflow from the unsafe stack cannot smash the safe stack.
Running clang -O2 -fsanitize=safe-stack fortify_example.c -S -masm=intel -o - on the fortify_example.c outputs:
main: # @main
.cfi_startproc
push r15
.Ltmp0:
.cfi_def_cfa_offset 16
push r14
.Ltmp1:
.cfi_def_cfa_offset 24
push rbx
.Ltmp2:
.cfi_def_cfa_offset 32
.Ltmp3:
.cfi_offset rbx, -32
.Ltmp4:
.cfi_offset r14, -24
.Ltmp5:
.cfi_offset r15, -16
mov r14, qword ptr [rip + __safestack_unsafe_stack_ptr@GOTTPOFF]
mov rbx, qword ptr fs:[r14]
lea rax, [rbx - 16]
mov qword ptr fs:[r14], rax
lea r15, [rbx - 8]
mov rsi, qword ptr [rsi]
mov rdi, r15
call strcpy
mov rdi, r15
call puts
mov qword ptr fs:[r14], rbx
xor eax, eax
pop rbx
pop r14
pop r15
ret
This code manages two stacks, the regular stack that is considered as safe, and an extra (unsafe) stack, that ends at rbx = fs:[r14]. All unsafe operations, such as the call to strcpy, are performed using the unsafe stack: rdi = r15 = [rbx - 8].
It has less impact on performances than the stack canary technique, as no check is performed at the end of the protected functions. This protects against a "trivial" rewrite of the return address present on the stack (as this address stays in the safe stack), but might not prevent the buffer overflow to overwrite something else. This depends on the memory layout of the running software.
Unlike previous technique, the security provided by this solution does not rely on the entropy source.
Control Flow Integrity (CFI)
Pass the -fsanitize=cfi flag to the compiler to add checks that the path taken by the program execution has not been subverted by an attacker when an indirect call is performed. This requires Link Time Optimization, and so the gold plugin, as activated by -fuse-ld=gold -flto. See http://clang.llvm.org/docs/ControlFlowIntegrity.html for more info!
For instance, the example cfi_example.cpp below shows how virtual function calls are protected by this method:
#include <iostream>
#include <string>
#include <memory>
struct Op
{
virtual ~Op() { }
virtual int f(int a, int b) const = 0;
};
struct Add: public Op
{
virtual int f(int a, int b) const override { return a+b; }
};
struct Mul: public Op
{
virtual int f(int a, int b) const override { return a*b; }
};
__attribute__((noinline)) int call_op(Op const& op, int a, int b)
{
return op.f(a,b);
}
int main(int argc, char** argv)
{
if (argc < 4) {
std::cerr << "Usage: " << argv[0] << " op a b" << std::endl;
return 1;
}
std::unique_ptr<Op> op;
switch (std::stol(argv[1])) {
case 0:
op = std::make_unique<Add>();
break;
case 1:
op = std::make_unique<Mul>();
break;
default:
return 1;
};
std::cout << call_op(*op, std::stol(argv[2]), std::stol(argv[3])) << std::endl;
return 0;
}
Compiling this code without CFI gives, for the call_op function:
; int __fastcall call_op(const Op* op, int a, int b)
op = rdi ; const Op *
mov rax, [op]
mov rax, [rax+10h]
jmp rax
The virtual call is done without any prior checks. That is, if any issue in the software lets an attacker to control the value stored at the address given by op, he could take control of the software execution flow.
Compiled with CFI enabled (clang++ -O2 -std=c++14 -flto -fsanitize=cfi -fuse-ld=gold cfi_example.cpp -o cfi_example), the call_op function is now this one:
; int __fastcall call_op(const Op* op, int a, int b)
op = rdi ; const Op *
mov rcx, [op] ; rcx = pointer to the vtable of 'op'
mov r8d, offset vtableadd_funcs
mov rax, rcx
sub rax, r8
rol rax, 3Ah
cmp rax, 2
jnb short loc_400DEF
mov rax, [rcx+10h]
jmp rax
loc_400DEF:
ud2
What's basically happening here is that it checks that the pointer to the vtable of the given op object is the one of the Add or Mul class. Indeed, vtableadd_funcs points to these data:
; `vtable for'Add
_ZTV3Add dq 0
dq offset _ZTI3Add ; `typeinfo for'Add
vtableadd_funcs dq offset _ZN2OpD2Ev ; Op::~Op()
dq offset _ZN3AddD0Ev ; Add::~Add()
dq offset _ZNK3Add1fEii ; Add::f(int,int)
dq 0
dq 0
dq 0
; `vtable for'Mul
_ZTV3Mul dq 0
dq offset _ZTI3Mul ; `typeinfo for'Mul
off_4011C0 dq offset _ZN2OpD2Ev ; Op::~Op()
dq offset _ZN3MulD0Ev ; Mul::~Mul()
dq offset _ZNK3Mul1fEii ; Mul::f(int,int)
So, the vtable pointer of the op object is subtracted to the one of add, and then we check whether we are in a possible range or not. If this is not the case, we jump to an undefined instruction that makes the process crash.
As a final note, beware that the Clang CFI only works with C++ objects. It won't protect for instance indirect call done with a more classical function pointer.
GOT protection
- read-only relocations:
- Passing the -Wl,-z,relro flag to the linker marks some section read only, which prevents some GOT overwrite attacks.
- immediate binding:
- If the -Wl,-z,now flag is passed to the linker, all symbols are resolved at load time. Combined with the previous flag, this prevents more GOT overwrite attacks (otherwise part of the GOT is updated at runtime, and that part is not marked as read-only by -Wl,-z,relro
Let's compile the following code hello.c with clang hello.c -o hello and clang hello.c -o hello.relro -Wl,-z,now -Wl,-z,relro:
#include <stdio.h>
int main() {
puts("degemer mat");
return 0;
}
And observe the output of readelf -rl on both:
> ./readelf -rl hello
Elf file type is EXEC (Executable file)
Entry point 0x400430
There are 8 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001c0 0x00000000000001c0 R E 8
INTERP 0x0000000000000200 0x0000000000400200 0x0000000000400200
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000006f4 0x00000000000006f4 R E 200000
LOAD 0x00000000000006f8 0x00000000006006f8 0x00000000006006f8
0x0000000000000240 0x0000000000000248 RW 200000
DYNAMIC 0x0000000000000710 0x0000000000600710 0x0000000000600710
0x00000000000001e0 0x00000000000001e0 RW 8
NOTE 0x000000000000021c 0x000000000040021c 0x000000000040021c
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000005d0 0x00000000004005d0 0x00000000004005d0
0x0000000000000034 0x0000000000000034 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
[...]
Relocation section '.rela.dyn' at offset 0x370 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
0000006008f0 000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
Relocation section '.rela.plt' at offset 0x388 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000600910 000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
000000600918 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
000000600920 000300000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0
The puts symbol is loaded at offset 000000600910 right into the DYNAMIC segment, which is in read write mode.
But when compiled with read-only relocations and immediate loading, we get:
> ./readelf -rl hello.relro
Elf file type is EXEC (Executable file)
Entry point 0x400470
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000734 0x0000000000000734 R E 200000
LOAD 0x0000000000000db0 0x0000000000600db0 0x0000000000600db0
0x0000000000000260 0x0000000000000268 RW 200000
DYNAMIC 0x0000000000000dc8 0x0000000000600dc8 0x0000000000600dc8
0x0000000000000200 0x0000000000000200 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x0000000000000610 0x0000000000400610 0x0000000000400610
0x0000000000000034 0x0000000000000034 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000000db0 0x0000000000600db0 0x0000000000600db0
0x0000000000000250 0x0000000000000250 R 1
[...]
Relocation section '.rela.dyn' at offset 0x3a8 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000600ff8 000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
Relocation section '.rela.plt' at offset 0x3c0 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000600fe0 000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
000000600fe8 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
000000600ff0 000300000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0
It turns out that puts is loaded in the GNU_RELRO segment marked as read only \o/.
This technique increases the binary loading time, as all the symbols are resolved upon loading and not on demand.
Detect Potential Formatting Attack
Pass the -Wformat -Wformat-security -Werror=format-security flags to the compiler to get a warning on potential formatting attack. -Wformat-security is activated by default, as showcased by the compilation of the following code format.c with clang format.c -o format:
#include <stdio.h>
int main(int argc, char* argv[]) {
printf(argv[1]);
return 0;
}
which outputs:
> clang format.c -o format format.c:4:10: warning: format string is not a string literal (potentially insecure) [-Wformat-security] printf(argv[1]); ^~~~~~~ 1 warning generated.
Conclusion
In place of a conclusion, a warning: hardening is only one piece of the armor, don't join the battlefield naked whith only a helmet on the top of your head :-)