Bradley, hash-and-decrypt, Gauss ... a brief history of armored malware and malicious crypto

What was really stunning about Stuxnet and its cousins was it was "open source". Once the sample was captured, it was big to analyze, but none of its embedded secrets could escape a malware analyst. And it had many secrets! From an engineering point of view, the architecture and design are clean and efficient, which makes Stuxnet a good example to learn how to design malware. From an intelligence point of view, knowing what is targeted is valuable. For a money point of view, considering how governments are racing to buy 0 days, Stuxnet was above expensive. And all this was wasted because the payload was unprotected!

This has already been very well explained by Nate Lawson in Stuxnet is embarrassing, not amazing.

What is amazing here is that several techniques have been presented since 1998 about that. And we had to wait for Gauss to use them.

Protecting the intellectual property is not the only reason to use armored malware. It is also a matter of precision as it allows to perform surgical strikes. If you want your code to target only a set of people, find the common denominator, and use it as a trigger (e.g. as key for what we talk about). You have then the insurance your code will be deciphered, thus executed, only for your targets (considering the key space is too big to be bruteforced of course).

All the crypto mysteries in Gauss are exposed in [11].

Update (17 Aug. 2012)

Very interesting article on Gauss by Matthew Green (as usual)

Environmental Key (J. Riordan, B. Schneier)

The starting point in [1] is that a mobile agent evolving in a hostile environment can not embed keys: if captured, key recovery is immediate, and so is its analysis.

Hence Riordan and Schneier presented a collection of cryptographic key constructions built from observation made from the environment where an agent is executed. These keys must be resistant to adversarial observations so that nobody but the agent itself is able to decipher a payload.

Their key idea is to build the key from the environment where the agent is running (hey, wait, doesn't it sound like Gauss?) They propose different constructions, but all of them are built upon the same idea:

  • The agent perform one or several observations (time, hash value of a remote web page, from stock value, hash of the RR record in a DNS answer, or the combination of several inputs).
  • You put all inputs in a hash function (or a combination of hash function concatenated with other hashed values).

That way, the analyst is clueless if the search space for the observations is big enough.

Among the multiple applications they discuss, the authors introduced the idea of a directed virus that would activate based on IP address of the target. This was ok in 1998 as everyone had public routable IP address, but was no more enough once internal networks where hidden behind firewalls with NAT. You can imagine nowadays a code targeting all 10.0.0.0/8 or 192.168.0.0/16 networks. Collateral damages would be unacceptable.

Key Points:

  • Use hash values as keys (seen in Gauss)
  • Targeted attack by constructing the key based on target information (seen in Gauss)
  • Can be mixed with secret sharing schemes (not seen yet)

2005: Armored virus Bradley (E. Filiol)

E. Filiol discussed about a malicious code he called Bradley [3]. It gets a step further to what Riordan and Schneier proposed by giving a complete protocol to build Bradley, mixing the environmental keys with metamorphism.

The armored virus is built with 4 stages:

  • A deciphering routine, in charge of getting the environmental information. If activation value is not good, Bradley self-erases. Otherwise, it computes key k1 and deciphers EVP1.
  • Encrypted code EVP1 (using key k1) contains all the anti-anti-viral mechanisms, and computes key k2.
  • Encrypted code EVP2 (using key k2) is in charge of the infection and polymorphism/metamorphism mechanisms, and computes key k3.
  • Encrypted code EVP3 (using key k3) contains all the optional payloads.

E. Filiol described extensively the protocol around Bradley, for both activation and ciphering/deciphering. Additionaly, the malware did not only rely on cryptography to protect itself. It also tested its running environment to detect suspicious activity, like most malwares do by now.

The deciphering routine, EVP1 and EVP2 are protections. So in the end, Bradley executes its payload only when it considers its environment to be safe.

From what it seems [11], Gauss is not getting that far. It has several encrypted parts, but they all seem to be deciphered at the same time, then executed. So, as far as it seems, if one finds the first piece of the puzzle, then everything is broken.

Key Points:

  • Mathematical proof of the exponential complexity of this problem (unfortunately validated in Gauss)
  • Multi-stage deciphering combined with environment checks (not seen yet)

2005: Secure triggers (A. Futoransky, L. Notarfrancesco, C. Sarraute, A. Waissbein)

Around same timing, CoreLabs came with secure triggers [4]. Given a set of predicates and a sample function (picking up some of the predicates), some action are performed iff the function returns a proper value.

This article goes deep into the math to formalize this, and the proper conditions to use such secure triggers. In Bradley, the activation is simple: compute a hash, check it matches. Here, a trigger needs a particular set of conditions in the environment to run the protected code.

Of course, one of the usage described by the authors is an information warfare worm attack. The trigger is activated based on target information. That way, you achieve surgical strike... again.

Key Point:

  • Multiple trigger schemes (only one seen in Gauss)

2006: hash-and-decrypt (N. Lawson)

In [8], Nate Lawson started to write about mesh design and multi-stage deciphering. His article targets software protection (game is taken as an example). But the bad guys may want to protect their pieces of code in the same way the good guys do.

A mesh is a set of verification functions (e.g. hash functions like SHA) in charge of verifying the integrity of sensitive parts of what has to be protected, but also checking on each other. So, if a software breakpoint is set at some point in the code, and then is checked with one of these functions, the output value will be wrong. Usually, in such schemes, the hash value is used to jump to the next instruction to execute.

Have a look at Skype, and Silver needle in the Skype. One of the protection scheme is close to that (slide 19), except that checksums are used instead of hash functions.

Going back to hash-and-decrypt, N. Lawson replaced the jumps with a deciphering function, e.g. AES. That way, a stage N cheks for its integrity first, then deciphers the next stage. This is exactly what Bradley does.

The difference between hash-and-decrypt and environmental keys / Bradley are simply where the bits to be hashed are taken from. Lawson's design is for code protection, so bits are taken from inside the protected program. But result is the same: you can not go to the next stage if conditions are not met.

Key Point:

  • Multi-stage deciphering combined with integrity check (not seen yet)

2008: Über-malware (E. Filiol, F. Raynal)

In [9], E. Filiol and I introduced a Über-malware that used several malicious crypto tricks:

  • Spreading is controlled by using a random generator
  • The code is armored AND has deniable encryption
  • It attempts to get invisible using statistical simulability

We wont discuss here the 2 latest as they are out of scope, but will focus on the "abnormal distribution" [10].

We designed a random generator to control the spreading. The principle is very simple, and can most certainly be refined. Consider the targets are given domains, they own IPv4 addresses. IPv4 address space is only 232. You can build a worm spreading only to some specific targets.

Let us consider some of the Gauss's targets (in order to keep it simple, we only considered a few of the real ranges), which we convert into their IPv4 range. All other IPv4 addresses have a probability of 0.

Target IPv4 range Probability Cumulative Prob.
Online money transfer (Beirut Lebanon) 91.212.1.0 - 91.212.1.255 0.35 $[0, 0.35[$
Bank of Beirut 212.36.198.0 - 212.36.198.255 0.4 $[0.35, 0.35+0.4[$
Bank HaPoalim 62.0.19.96 - 62.0.19.127 0.15 $[0.75, 0.75+0.15[$
Bank of Palestine 213.244.121.0 - 213.244.121.255 0.1 $[0.9, 0.1]$

Let $x$ be an IPv4 address. We use a uniform random generator to get a value $y=rand()$. Considering $y$ to be a probability, we look for the event $x$, so that $f^{-1}(y) = x$, where $f$ is the distribution we want. What is helpful there is that we only know $f^{-1}$ since it is the cumulative probabilities.

Consider our random generator returns 0.41, the event x is then "Bank of Beirut" because 0.41 is in the interval $[0.35, 0.75[$.

Note that we consider every IP in a given IP range to have the same probability. It means that the event 212.36.198.42 has a probability of 0.4/256 (/256 since there are 256 addresses in that block).

This behavior suits a malware having its own capability for transportation, whether it targets servers, or client side. For the servers, it is obvious as it is just an exposed IP address on the Internet, and that is just what we described. For client side, it can use email and address book. For each email in the address book, it looks at the domain, resolves it, looks up in the probability array, then gets a random number: if it is below the probability, it sends a fake email from the host embedding some 0 days in an attacked document.

I dont know if Gauss spreads using this kind of scheme, but what I know is it is not a problem to have surgical strikes with some math for this kind of malware.

Key Point:

  • Controlled spreading (Yes in Gauss even if we do not know how)

2012: Gauss (some gov agency)

So finally Gauss came to life. If you want to learn more about it, read [10, 11]. It used some of the (simple) tricks exposed earlier. We can expect from people able to forge collisions for certificate (remember Flame) to be quite good at that game too, and propose advanced tricks. For now, they seem very well protected.

Thus, there are still many unanswered questions about Gauss:

  • What does contain the encrypted Godel module?
  • What is the purpose of the “Palida Narrow” font?
  • How does the malware propagate?

Maybe some of the answers are in the encrypted module. Maybe not. No way to know. But what is sure is people behind Gauss have learned, as clearly stated Nate Lawson [12].

Bibliography

[1] Environmental Key Generation towards Clueless Agents
J. Riordan, B. Schneier
Mobile Agents and Security, G. Vigna, ed., Springer-Verlag, 1998, pp. 15-24.

[2] Malicious Cryptography: Exposing Cryptovirology
A. Young, M. Yung, , Wiley, 2004 ISBN 0764549758

[3] Strong Cryptography Armoured Computer Viruses Forbidding Code Analysis: the Bradley virus
Éric Filiol, Proceedings of the 14th EICAR Conference, 2005

[4] Advanced Software Protection Now,
A. Futoransky, L. Notarfrancesco, C. Sarraute, A. Waissbein
TISSEC, 2005

[5] Malicious crypto, F. Raynal, EuSecWest 2006,

[6] Malicious crypto -- part 1, F. Raynal, 2006

[7] Malicious crypto -- part 2, F. Raynal, 2006

[8] Mesh design pattern: hash-and-decrypt, Nate Lawson, 2007

[9] Malicious cryptography ... reloaded, E. Filiol, F. Raynal, CanSecWest, 2008

[10] Gauss: Abnormal Distribution

[11] The Mystery of the Encrypted Gauss Payload, GReAT, 2012

[12] Cyber-weapon authors catch up on blog reading, Nate Lawson, 2012

[13] On Gauss, Matthew Green, 2012

Comments