Arybo: cleaning obfuscation by playing with mixed boolean and arithmetic operations

Obfuscation is made of many different tricks. One we meet very often is mixed instructions who make computations mixing usual arithmetic (ADD, SUB, MUL, DIV) and boolean one (XOR, AND, NOT, OR). All tools get lost when it comes to cleaning this kind of very messy blocks of instructions, and that is why we designed Arybo. With Arybo, analyzing such expressions become way more easy.

We thus introduce the first public release of Arybo, a tool to manipulate/understand/simplify these expressions. It is useful for both de-obfuscation and optimisation purposes.

Indeed, while reversing obfuscated programs, various kind of ugly things can be encountered (as the one we already presented [11] [12]). Among them are expressions whose role is generally to compute simple things with a sequence of complex (yet fast) instructions. Their goal is to have reverse engineers losing their time (and keeping their job in the same time :)).

Mixed Boolean-Arithmetics (MBA) expressions are ones of this kind. They use both the conventional computer integer arithmetic on n-bit words (addition, subtraction, multiplication, and division) and the bitwise logic operations (AND, XOR, OR, NOT). These expressions aren't used for the sole purpose of obfuscation, but also in tricks to compute some expressions faster (the famous "Bit Twiddling Hacks" [0] is a great example of such things). If you remember well, Ninon Eyrolles already wrote about these beasts on this blog [10], and how "simplifying" them is a really complex tasks (indeed, even understanding what "simplifying" means is complex).

Arybo is a Python module wrapping a binary library that has been designed to help to understand, create and reverse such expressions. In a nutshell, we can have an obfuscated function operating on 64-bit integers like this one:

def f(x):
  return ((~x | 0x7AFAFA697AFAFA69) & 0xA061440A061440) \
      + ((x & 0x10401050504) | 0x1010104)

and directly understand its behavior thanks to Arybo:

def f(x):
  return ((~x | 0x7AFAFA697AFAFA69) & 0xA061440A061440) \
      + ((x & 0x10401050504) | 0x1010104)

from arybo.lib import MBA
mba = MBA(64)
x = mba.var('x')
ret = f(x)
print(ret)

In this script, we declare x as a 64-bit symbolic constant. We then compute the symbolic expressions of f with a full 64-bit symbolic variable as input. This outputs (reformatted to fit this blog):

Vec([0,0,1,0,0,0,1,0,1,0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0])

which looks like a constant. This bit vector is represented with the LSB-bit on the left. We can easily convert it to a more human-readable constant:

print(hex(ret.to_cst()))

which returns the constant 0xa061440b071544. This means that this function always return this value, on its whole 64-bit input space. What's important to notice is that we have this result directly, without any bruteforcing.

If you want to install it now, have a look at the Quick Start section!

Let's show now some interesting work that can be accomplished thanks to Arybo!

Real life examples

Complex functions with MBA

The first function Arybo has ever been used on was taken from an obfuscated DRM Francis Gabriel and Camille Mougey audited while working at Quarkslab. This function looked heavily obfuscated, with a control flow looking like this (extracted from their Recon presentation [4]):

Using Miasm, they produced an equivalent transfer function written in Python:

By using a common subexpression elimination algorithm, they reduced the function to this final snippet:

def f(x):
    v0 = x*0xe5 + 0xF7
    v0 = v0&0xFF
    v3 = (((((v0*0x26)+0x55)&0xFE)+(v0*0xED)+0xD6)&0xFF )
    v4 = ((((((- (v3*0x2))+0xFF)&0xFE)+v3)*0x03)+0x4D)
    v5 = (((((v4*0x56)+0x24)&0x46)*0x4B)+(v4*0xE7)+0x76)
    v7 = ((((v5*0x3A)+0xAF)&0xF4)+(v5*0x63)+0x2E)
    v6 = (v7&0x94)
    v8 = ((((v6+v6+(- (v7&0xFF)))*0x67)+0xD))
    res = ((v8*0x2D)+(((v8*0xAE)|0x22)*0xE5)+0xC2)&0xFF
    return (0xed*(res-0xF7))&0xff

This function takes an 8-bit integer as an input and produce an 8-bit integer in the end. It looks like it's doing complex operations, mixing both classical and boolean arithmetics. For information, the full example code is in the source tree in in examples/xor_5C.py.

Running Arybo on this function

from arybo.lib import MBA
mba = MBA(8)
x = mba.var('x')
ret = f(x)
print(ret)

gives the following output:

Vec([
X0,
X1,
(X2 + 1),
(X3 + 1),
(X4 + 1),
X5,
(X6 + 1),
X7
])

petanque uses the Algebraic Normal Form (ANF) [1]. This basically means that boolean expressions are represented using only the XOR and AND operators. Moreover, they are written modulus 2, so the addition is a XOR operation, and multiplication an AND.

So, what Arybo is basically telling us here is that \(f(x) = x \oplus (0 0 1 1 1 0 1 0)\) (with \(\oplus\) the XOR binary operator). Let's extract this constant:

app = ret.vectorial_decomp([x])
print(hex(app.cst().get_int_be()))

which outputs 0x5C. More information about the vectorial_decomp function can be found here. So, in the end, this function is basically XORing the 8-bit input with the 0x5C constant!

SSTIC2016 CRC inversion

The SSTIC 2016 challenge had one crackme written in TI-83 basic. The crackme takes an integer as input, makes some computation, and if the result is correct, uses it to compute a key.

After reversing it, we end up with this equivalent Python code:

S = [0,1996959894,[...]]

def compute(Z):
    Z = int(sys.argv[1]) # integer
    C=0xFFFFFFFF
    A = 0

    for N in range(0,4):
        C_ = (C >> (A*8)) & 0xFF
        Z_ = (Z >> (N*8)) & 0xFF
        t = Z_ ^ C_
        T_ = S[t]
        B = C >> 8
        C = B ^ T_

    C = (~C) & 0xFFFFFFFF
    return C

What we had to do is, knowing \(C\), find \(Z\) such as \(C = f(Z)\). In other words, invert \(f\) so that we can compute \(Z = f^{-1}(C)\). The trained eye would have recognized a CRC algorithm. There are plenty of ways to do this [7] [8] [9] , but if you're in a hurry and don't want to have to think about how to do it (just do it), Arybo can help:

from arybo.lib import MBA, simplify, simplify_inplace
from arybo.tools import app_inverse

mba = MBA(32)

S = [0,1996959894,[...]]

S,X = mba.permut2expr(S)
S = S.vectorial_decomp([X])

def compute(Z):
    C = mba.from_cst(0xFFFFFFFF)
    A = 0

    for N in range(0,4):
        C_ = (C >> (A*8)) & 0xFF
        Z_ = (Z >> (N*8)) & 0xFF
        t = Z_ ^ C_
        T_ = mba.from_vec(simplify(S(t.vec)))
        B = C >> 8
        C = B ^ T_

    C = (~C)
    return simplify_inplace(C)

E = compute(X)
A = E.vectorial_decomp([X])
Ainv = app_inverse(A)
sol = mba.from_vec(simplify(Ainv(mba.from_cst(3298472535).vec))).to_cst()
print(sol)

What we basically do is computing the full symbolic boolean expressions associated with a CRC, given a 32-bit symbolic input. This gives us an affine function that takes a 32-bit input variable, and produce a 32-bit one. As this function is affine (in the \(F_2^{n}\) space, see the theoretical section above for more information), we can inverse it using the app_inverse API, and then compute our final solution!

The full working code is in the examples/sstic2016_crc.py file in the source tree.

I want more!

More detailed examples can be found in the tutorials available in the documentation.

Quick Start

Want to try it by yourself? Under Windows/OSX/Linux, one can simply do:

$ pip3 install arybo

to get Arybo installed.

Arybo is a Python 3 only library. This also installs the pytanque module, which is a Python binding to the C++ library petanque, which efficiently handles the boolean expressions themselves. Binary wheels are provided for OSX and Windows 64 bits under Python 3.5.

Requirements for OSX/Linux are a recent C++11 compliant compiler and the Python3 development libraries. GCC >= 4.9 and clang >= 3.5 are known to work. Under Debian/ubuntu, these packages need to be installed:

$ sudo apt-get install build-essential libpython3-dev python3-dev

Requirements for Windows are at least having Visual Studio 2015 Update 3 and the official Python >= 3.5 package.

The project is hosted on Github at this address: https://github.com/quarkslab/arybo. A full documentation is available here: http://pythonhosted.org/arybo/.

After a successful setup, don't hesitate to read an run the scripts in the examples directory of the source code. You can also get a quick run using the iarybo shell and run the function shown in the introduction:

$ iarybo 64
In [1]: ret = ((~x | 0x7AFAFA697AFAFA69) & 0xA061440A061440) + ((x & 0x10401050504) | 0x1010104)
In [2]: print(hex(ret.to_cst()))
0xa061440b071544

A bit of theory

Under the hood, symbolic variables are represented as symbolic bit-vectors. This basically means that we represent an n-bit integer as n symbolic boolean expressions. This is a quite classical representation [...]

We are using a canonical form to represent boolean expressions. This gives us the ability to identify/compare functions. Moreover, multiple canonical forms exist [2]. Currently, we only support the Algebraic Normal Form (ANF), which allows considering functions as application with a non-linear and affine part in the affine space \((F_2)^n\). One example of such application is in the examples/hitb2015_crypto400.py file, to break a special form of Even-Mansour. Support for other canonical forms might be implemented in the future.

To handle arithmetic operations, a classical logical 1-bit adder has been implemented. Then, the multiplication is built upon this adder. The division by a constant integer is done using a technique described in the Hacker's Delight [5] (and used in libdivide [4]), which transforms a division by a given n-bit integer into a multiplication by a 2n-bit integer and a right logical shift.

For more information about the theoretical concepts used in Arybo, have a look at the "Concepts" section of the documentation!

Going further

If you are interested at the various possibilities of Arybo, have a look at the documentation here: http://pythonhosted.org/arybo/, especially the "Tutorials" section. Moreover, other articles may be written on this blog for more reversing, cryptographic and other examples :)

Feedbacks of any kind can be sent to aguinet@quarkslab.com (or @adriengnt on Twitter), and pull requests are welcome on the Github repository available here: https://github.com/quarkslab/arybo.

Last but not least, the tool will be presented at GreHack'16, so stay tuned :) !

Acknowledgment

The author would like to thank Francis Gabriel and Camille Mougey for their original work on DRM deobfuscation [4] (that contained the XOR 0x5C example) which first inspired this tool.

Moreover, special thanks to Marion Videau which greatly helped synthesizing this work, and to Ninon Eyrolles for being the first user and bug finder of Arybo ;)

Also thanks to Serge Guelton and Fred Raynal for the reviewing of this blog post, and Philippe Teuwen for his review of the documentation!

[0]https://graphics.stanford.edu/~seander/bithacks.html
[1]https://en.wikipedia.org/wiki/Algebraic_normal_form
[2]https://en.wikipedia.org/wiki/Canonical_normal_form
[3]http://libdivide.com/
[4](1, 2, 3) https://recon.cx/2014/slides/recon2014-21-mougey-camille-francis-gabriel-DRM-obfuscation-versus-auxiliary-attacks-slides.pdf
[5]http://www.hackersdelight.org/magic.htm
[6]http://communaute.sstic.org/ChallengeSSTIC2016
[7]https://sar.informatik.hu-berlin.de/research/publications/SAR-PR-2006-05/SAR-PR-2006-05_.pdf
[8]https://blog.affien.com/archives/2005/07/15/reversing-crc/
[9]By manually reverting each step of the algorithm... (can be error-prone)
[10]http://blog.quarkslab.com/what-theoretical-tools-are-needed-to-simplify-mba-expressions.html
[11]http://blog.quarkslab.com/turning-regular-code-into-atrocities-with-llvm.html
[12]http://blog.quarkslab.com/turning-regular-code-into-atrocities-with-llvm-the-return.html

Comments