What happens if one builds up on the Spectre vulnerability to implement a convoluted version of memcpy? From an obfuscator point-of-view, it unleashes a wide range of opportunities, which turn a definite bug into a fun[nk]y feature.

Context

Basically, the Spectre [0] vulnerability makes it possible to read a memory location without actually referencing or accessing it. Some kind of shadow memory copy. Let's formalize it that way. The code is an extension to https://github.com/crozone/SpectrePoC/blob/master/spectre.c We leverage that PoC to write the following function:

void shadow_memcpy(char * dest, char const * src, size_t n) {
  /* Default to a cache hit threshold of 80 */
  int cache_hit_threshold = 80;
  void* mem = malloc(n + 1 /* + 1 to avoid an out of bound write in readMemoryByte*/);
  memcpy(mem, src, n);

  size_t malicious_x = (size_t)((char const*)mem - (char const* ) array1);

  int score[2];
  uint8_t value[2];
  int i;

  for (i = 0; i < sizeof(array2); i++) {
    array2[i] = 1; /* write to array2 so in RAM not copy-on-write zero pages */
  }

  /* Start the read loop to read each address */
  for(size_t i = 0; i < n; ++i) {
    readMemoryByte(cache_hit_threshold, malicious_x+i, &dest[i], score);
  }
          free(mem);
}

There's not much to say about it, except that it first performs a copy from src to mem to make sure the address is not located on the stack. It then uses spectre vulnerability to perform a shadow copy from mem to dest.

Why is it a good obfuscation?

Using the same idea as Spectre, shadow_memcpy uses a global hidden state, the memory cache, to transfer data from a buffer to another one. The good thing with that hidden state is that it is invisible to static and dynamic analysis tools. There is actually no data dependency between src and dest.

It also turns out that the original PoC makes my qemu (version 2.10.1) crash with an illegal hardware instruction signal. Some kind of nice anti-debug effect :-)

In the following, let us assume that the shadow_memcpy is opaque to a static analysis tool. What happens then?

Application 1: Opaque Predicate

The following piece of code:

bool true_predicate(int any) {
    char p = any;
    char res;
    shadow_memcpy(&res, &p, sizeof(p));
    return p == res;
}

builds an opaque predicate with dependency injection. The parameter any seems to be used but its value actually does not matter. The result should always be true, but because shadow_memcpy is opaque, so is true_predicate!

Application 2: Opaque Function Call

The following code is just a variant of the previous one, storing a function pointer instead of a random value.

typedef void (*function_type)(char const*);

function_type call_puts() {
    function_type out[1];
    shadow_memcpy(out, puts, sizeof(function_type));
    return out[0];
}

So this actually returns the address of the puts symbol each time it's called, in an opaque way thanks to shadow_memcpy. Wrap every function call in a similar manner and IDA will have a hard time reconstructing the call-graph :-)

Application 3: Prevent Symbolic Analysis

Tools like Triton [1] try to track the use of a register marked as a symbolic value. To achieve this goal they (try to) emulate every instruction, keep track of the operations and then solve the symbolic expression at some point to recover user inputs. With shadow_memcpy, something that cannot be captured by their symbolic model is introduced. The relationship between the dest and src is lost, and there is no more formula to solve:

int anonymize(int value) {
    int res;
    shadow_memcpy(&res, &value, sizeof(value));
    return res;
}

We tried to solve a stupid challenge like the one below with Triton, and it indeed failed to find a solution for input:

int main(int argc, const char * * argv) {
  int /*the symbolic value*/ input = atoi(argv[1]);
  if(anonymize(input) == 1)
    puts("yeah");
  return (0);
}

What about watchpoints?

Out of curiosity, and to answer the questionning of @pappy, I tried the following setup:

  1. Write a variant of shadow_memcpy that does not perform the copy of src in the allocated buffer mem. Let's call it nether_shadow_memcpy

    void __attribute__((noinline)) nether_shadow_memcpy(char * dest, char const * src, size_t n); // to make breakpointing easier
    
    void nether_shadow_memcpy(char * dest, char const * src, size_t n) {
      /* Default to a cache hit threshold of 80 */
      int cache_hit_threshold = 80;
      size_t malicious_x = (size_t)(src - (char const* ) array1);
    
      int score[2];
      uint8_t value[2];
      int i;
    
      for (i = 0; i < sizeof(array2); i++) {
        array2[i] = 1; /* write to array2 so in RAM not copy-on-write zero pages */
      }
    
      /* Start the read loop to read each address */
      for(size_t i = 0; i < n; ++i) {
        readMemoryByte(cache_hit_threshold, malicious_x+i, &dest[i], score);
      }
    }
    
  2. Compile a test program with gcc -O0 -g, run it under gdb, break on nether_shadow_memcpy and use a hardware watchpoint to check if *src is accessed (using awatch *src''). Issue ``show can-use-hw-watchpoints to make sure hardware watchpoints are enabled.

    $> gdb ./.a.out
    ...
    (gdb) b nether_shadow_memcpy
    ...
    (gdb) r
    Starting program:  .../a.out
    Breakpoint 1, nether_shadow_memcpy(dest=0x7fffffffe58c "UU", src=0x555555756580 <mem> "\001", n=4) at spectre.c:204
    (gdb) awatch *0x555555756580
    Hardware access (read/write) watchpoint 2: *0x555555756580
    (gdb) c
    Continuing.
    yeah
    [Inferior 1 (process 14810) exited normally]
    

And it's not caught! Well, something probably happens at the hardware level, but gets filtered out because it's only a speculative read. But that's only... my speculation :-)

Limitations

There are indeed limitations to the approach, which don't make it a reasonable choice for a real obfuscator:

  1. It relies on a timing attack. Although it is surprisingly accurate, there is no guarantee that it will work faithfully in say... multithreaded applications or CPU under a heavy load.

  2. The pattern is likely to be known, or easily reversible by a human reverser. And then a specific pattern matching approach can just catch shadow_memcpy and give it the semantic of a memcpy (this is going to get harder if shadow_memcpy is marked as __attribute__((always_inline)) though).

  3. The size of the copy is limited by the register size, but this could be worked around by splitting the copy into chunks.

  4. Don't ask about the performance impact.

So, well, it's still a bug we cannot seriously build upon, but at least these were funny applications :)

Conclusion

There's actually a good idea hidden in this blogpost: if you can craft a dozen obfuscated versions of memcpy (e.g. building one based on an xor obfuscated by Mixed Boolean Arithmetic), then you have plenty of obfuscations to unleash!

Thanks

To the happy reverser and developers at QB for reviewing and challenging this blog post! Special kudos to Adrien for the gdb shenanigans

[0]https://spectreattack.com/spectre.pdf
[1]https://triton.quarkslab.com/

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!