Study of an Android runtime (ART) hijacking mechanism for bytecode injection through a step-by-step analysis of the packer used to protect the DJI Pilot Android application.
Introduction
In the world of Android applications, it's not uncommon to come across applications protected by a packer. The role of a packer is to protect all or part of the application code from static analysis. There are many reasons why a developer might want to protect an application:
- Protect valuable business logic;
- Protect application monetization logic (e.g. a license management mechanism);
- Evading conventional analysis tools to hide malicious logic;
- ...
Here, we take a look at the DJI Pilot application, not to understand why developers want to protect their code - this has already been the subject of previous work (see in particular this DJI Pilot analysis) - but to highlight a runtime mechanism implemented by DJI to protect its application code. This protection is based on the use of a modified version of the SecNeo packer.
The article details the various stages in the analysis to understand how the application code is obfuscated. A Python proof-of-concept named DxFx for statically unpacking the DJI Pilot application is provided as practical support for this article. DxFx does not claim to be a SecNeo unpacker. Its sole aim is to improve the reader's understanding of the various mechanisms implemented by the packer through Python code. It will not be maintained in the future.
Targeted application
The analysis is performed on the latest version of the DJI Pilot application:
- Version: 2.5.1.17
- SHA256:
642aa123437c259eea5895fe01dc4210c4a3a430842b79612074d88745f54714
- Download link
DxFx provided in support of the article has also been tested on the following versions of the DJI Pilot application:
- Version: 2.5.1.15
- SHA256:
d6f96f049bc92b01c4782e27ed94a55ab232717c7defc4c14c1059e4fa5254c8
and
- Version: 2.5.1.10
- SHA256:
860d9d75dc2b2e9426f811589b624b96000fea07cc981b15005686d3c55251d9
Bytecode, where are you?
Primary analysis
Static analysis of the APK initially reveals that the result of bytecode decompilation is, to say the least, uncluttered...
This is because, like other packers, SecNeo leaves only a bootstrap code in
the bytecode to launch the application's unpacking phase. Here, the packer
bootstrap code loads the native libDexHelper.so
library:
The first step in the analysis is therefore to find the bytecode containing the application's business logic.
The packer logic is present in the native library libDexHelper.so
. However,
the code of this library is itself packed. So, we have to unpack... the packer
to analyze its logic.
As the aim of this article is not to understand how the packer itself is protected, this part is not dealt with in-depth, and we simply dump the library at runtime from the DJI Pilot application process memory space. There are a multitude of ways to do this, using tools such as gdb or Frida.
However, you may be in for a few surprises:
Cannot attach to process 25562: Operation not permitted (1), process 25562 is already traced by process 25598
or:
Failed to attach: process not found
The packer contains some countermeasures, as partially described in this issue, to prevent the use of dynamic tools. Fortunately, these can be easily bypassed.
Once libDexHelper.so
has been dumped from memory, it can be analyzed with a
disassembly tool.
First look at the packer binary
An initial brief analysis of the libDexHelper.so
library reveals the presence
of the decrypt_jar_128K
symbol. A hook of the associated function with
Frida reveals that a buffer is passed as input and contains the contents of a
DEX file as output :
'use strict';
const dlopen_ext = Module.getExportByName(null, '__loader_android_dlopen_ext');
function main() {
const decrypt_jar_128K_addr = Module.getExportByName(
'libDexHelper.so', 'decrypt_jar_128K'
);
/**
* decrypt_jar_128K function hook
*/
Interceptor.attach(decrypt_jar_128K_addr, {
onEnter: function(args) {
this.dex_buffer_ptr = args[1];
},
onLeave: function() {
console.log(`\nReading dex buffer @ ${this.dex_buffer_ptr}`);
console.log(this.dex_buffer_ptr.readByteArray(16));
}
});
}
/**
* Bootstrap
*/
const boot_intercept = Interceptor.attach(dlopen_ext, {
onEnter: function(args) {
this.name = args[0].readUtf8String();
},
onLeave: function() {
if (this.name.includes('libDexHelper.so')) {
main()
boot_intercept.detach();
}
}
});
The result of the script is:
Reading dex buffer @ 0x74d1e63140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 4a 8b b5 fd 1b 58 54 1f dex.035.J....XT.
Reading dex buffer @ 0x74d268c140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 6f 02 2a 0b 48 26 a5 e0 dex.035.o.*.H&..
Reading dex buffer @ 0x74d3005140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 8a b4 08 1c 90 61 5a 34 dex.035......aZ4
Reading dex buffer @ 0x74d3643140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 cb b9 8e 72 35 3a d8 bc dex.035....r5:..
Reading dex buffer @ 0x74d4055140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 c2 8b a3 7b 64 3b c6 54 dex.035....{d;.T
Reading dex buffer @ 0x74d4a5f140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 dd 47 c2 4e a1 39 cc 79 dex.035..G.N.9.y
Reading dex buffer @ 0x74d552f140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 58 17 ae a9 56 21 f1 1f dex.035.X...V!..
Reading dex buffer @ 0x74d5a77140
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 64 65 78 0a 30 33 35 00 84 62 14 0d ac 5f b7 f8 dex.035..b..._..
So, here we can see that 8 DEX files (with the dex.035
magic) are
unpacked. It is possible to modify the previous hook to be able to dump the
various DEX files as they are unpacked. Another solution is to understand where
the packed DEX files are stored in the APK and how we can unpack them
statically.
Static unpacking of DEX files
The advantage of the dynamic extraction method lies in its rapid implementation. However, the latter requires the application to be run and an environment set up to allow instrumentation of the process. Static extraction, on the other hand, enables cold unpacking of DEX files directly from the APK. The drawback of the static approach is that it requires a slightly deeper understanding of how the packer works.
DEX files where are you?
Some versions of the SecNeo packer store the bytecode in the classes0.jar
file located in the APK assets. Unfortunately, this is not the case here as
the file does not exist.
However, if we take a closer look at the classes.dex
file located at the root
of the APK and supposed to contain only the packer bootstrap code, we can see
that something is wrong with its size:
du -h classes.dex
63M classes.dex
63MB is a very large size for the code we observed in the first analysis.
Usually, the multidex mechanism
will split the bytecode file into several .dex
files well before reaching
this size. File entropy analysis also gives us some interesting clues:
We can see 8 peaks tending towards an entropy of 8, which may
suggest that these chunks are encrypted. The previous Frida hook revealed
that 8 DEX files were unpacked, which is probably no coincidence. The 8 chunks
shown in the graph correspond to 128KB sections, so we can make the connection
with the decrypt_jar_128K
symbol of the function. A differential analysis
with the dynamically obtained files finally confirms that the classes.dex
file contains all 8 DEX files after the SecNeo bootstrap code. The first 128K
chunk of each DEX file is encrypted to probably conceal certain information that
could be used to detect the presence of the hidden files like the
magic number
in the header.
Encryption analysis
To understand how the first 128KB of each DEX is decrypted, we need to analyze
how the decrypt_jar_128K
function works.
One of the function's basic blocks contains the encryption logic:
loc_8DC78
ADD W3, W3, #1 ; i++
LDRB W6, [X5],#1 ; x = buffer[cursor++]
AND W7, W3, #0xFF ; i %= 256
SUB W0, W5, W1
MOV X3, X7
CMP X2, X0
LDRB W0, [X8,X7] ; +--
ADD W4, W4, W0 ; | j = (j + S[i]) % 256
AND W9, W4, #0xFF ; +--
MOV X4, X9
LDRB W10, [X8,X9] ; +--
STRB W10, [X8,X7] ; |
STRB W0, [X8,X9] ; | S[i], S[j] = S[j], S[i]
LDRB W7, [X8,X7] ; +--
ADD W0, W7, W0 ; +--
UXTB W0, W0 ; |
LDRB W0, [X8,X0] ; | x = S[(S[i] + S[j]) % 256] ^ x
EOR W0, W0, W6 ; +--
STURB W0, [X5,#-1] ; buffer[cursor-1] = x
B.HI loc_8DC78
This is RC4's pseudo-random generation algorithm (PRGA):
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
t := (S[i] + S[j]) mod 256
K := S[t]
output K
endwhile
Analysis of the decrypt_jar_128K CFG gives us information about where different parts of the RC4 algorithm are located:
Encryption key generation
The key's cross-references lead to a generation function based on a simple XOR
between a 16-byte hardcoded constant and the 16 first bytes of the string
com.dji.industry.pilot
:
We are now able to statically unpack DEX files.
The DEX encryption is currently implemented in the DexPool class of DxFx
However, disassembly of the unpacked DEX files reveals a problem. The code for
a large number of methods seems to have been stolen, overwritten, and replaced
mainly by nop
instructions:
We can therefore assume that the packer has a second bytecode protection mechanism.
Bytecode where are you? Again...
Method debug info
The various methods whose code is stolen all seem to contain a
debug info offset
(debug_info_off
) which also appears in the body of the method:
It seems there is something fishy with the debug_info_off
, this field could
play a role in the method code unpacking mechanism, perhaps as an identifier.
Moreover, a classes.dgc
file located in the APK assets contains a large
number of debug info offsets used in stolen methods... The classes.dgc
file
therefore seems a potentially interesting candidate for further analysis.
The classes.dgc file
An entropy analysis reveals that the beginning of the file (oddly enough, a 128KB chunk) probably contains encrypted data:
This is a good lead to follow in the libDexHelper.so
binary.
Encryption analysis
Likely, a mechanism similar to the 128KB chunk encryption of DEX
files is used for the classes.dgc
file. Analysis of libDexHelper.so
reveals
a function whose scheme also corresponds to an RC4 encryption algorithm:
We can confirm that is the classes.dgc
decryption function by using a simple
Frida hook:
'use strict';
const dlopen_ext = Module.getExportByName(null, "__loader_android_dlopen_ext");
const nullptr = 0;
function main() {
const rc4_fct_addr = Module.getExportByName(
'libDexHelper.so',
'p416302DA23BEF5D5A81473ACFAC4DA25'
);
Interceptor.attach(rc4_fct_addr, {
onEnter: function(args) {
console.log(args[0].readByteArray(32))
}
});
}
Interceptor.attach(dlopen_ext, {
onEnter: function(args) {
this.name = args[0].readUtf8String();
},
onLeave: function(retval) {
if (retval != nullptr && this.name.includes('libDexHelper.so'))
main();
}
});
The result is:
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
00000000 ef bd de 50 8b bb 81 c7 80 63 35 ca 95 6e 1d 1d ...P.....c5..n..
00000010 36 d5 ef 02 df 2a 50 2b e8 88 03 c3 9b 45 da 5f 6....*P+.....E._
It matches the first bytes of the classes.dgc
file:
As with the decrypt_jar_128K
function, the basic block initializing S
to
identity permutation reveals the presence of a cross-reference to the key.
Encryption key generation
From the cross-references, it is possible to locate the key generation function. The CFG of the function looks a bit like the one used to generate the DEX decryption key. However, a slightly more complex mechanism is used to generate the key:
First, the MD5 hash of a 4096-byte binary
blob in memory is computed. MD5 is identified by looking at a sub-function called
in the previous CFG. This sub-function corresponds to the
MD5 algorithm for calculating a
block (512 bits). The algorithm is flattened and contains hardcoded K
constants (0xe8c7b756
, 0xd76aa478
, ...).
The binary blob is loaded directly from libDexHelper.so
and can be found even
in the packed version of the library. This chunk appears to be preceded by a
kind of header containing the name mthfilekey
:
Once the MD5 has been calculated, a deterministic sequence is generated by calling another sub-function. Analysis of the function reveals that it is a Fibonacci sequence:
Next, the 16 bytes of the MD5 hash are XORed with 16 bytes retrieved directly
from the 4096-byte chunk (mthfilekey
) following a deterministic walk based
on the Fibonacci sequence previously generated.
We are now able to statically generate the RC4 key that decrypts the first
128KB of the classes.dgc
file.
- The
classes.dgc
decryption is implemented in the CodePool._decrypt_chunk method of DxFx.- The RC4 key generation is done by the BinHelper.code_pool_key method of DxFx.
classes.dgc file format
Once decrypted, looking at classes.dgc
reveals that the beginning of the
file contains a table indexing all the application methods
(code_item)
whose code has been stolen:
Each table item points to the code_item of a method:
However, as it stands, the Dalvik opcodes present in the method bodies seem inconsistent and therefore probably obfuscated... At this stage, we have all the elements needed to link the stolen bytecode (even if obfuscated for the moment, we will address this later) to the application's various damaged methods. First of all, it's interesting to understand when the packer repairs the methods so that the application can run normally. This mechanism is particularly interesting because it uses an ART's functionality.
ART hijacking
ART in a nutshell
The Android Runtime (ART) is Dalvik's successor runtime in charge of optimizing and executing code for Android applications and other Android system components. The Android Runtime — How Dalvik and ART work? article by Paulina Sadowska is a great introduction to ART.
Class loading mechanism
When a method is to be executed, the runtime must first check that the class to which the method belongs is loaded. If this is not the case, the runtime will load and link the class. The linking process involves several phases as described in the Java Language Specification:
- Class verification;
- Class preparation;
- Resolution.
The stage we're interested in here is the class verification because it's
precisely this stage that is instrumented by the packer. Among other things,
this step checks the bytecode of the class's various methods for
inconsistencies. It is implemented in the ClassLinker::VerifyClass
method of
ART.
One of the interesting features of VerifyClass
is that it calls the
UpdateClassAfterVerification
method:
static void UpdateClassAfterVerification(Handle<mirror::Class> klass,
PointerSize pointer_size,
verifier::FailureKind failure_kind)
REQUIRES_SHARED(Locks::mutator_lock_) {
// [...]
// Now that the class has passed verification, try to set nterp entrypoints
// to methods that currently use the switch interpreter.
if (interpreter::CanRuntimeUseNterp()) {
for (ArtMethod& m : klass->GetMethods(pointer_size)) {
if (class_linker->IsQuickToInterpreterBridge(m.GetEntryPointFromQuickCompiledCode())) {
runtime->GetInstrumentation()->InitializeMethodsCode(&m, /*aot_code=*/nullptr);
}
}
}
}
UpdateClassAfterVerification
updates the entry points of the various methods
of the verified class. So, it has to iterate over all
the methods of the class and call the Instrumentation::InitializeMethodsCode
method:
Anatomy of the hook
The Instrumentation::InitializeMethodsCode
method provides a crossing point
on every method in the application that can be executed. It is precisely this
crossing point that is exploited by the packer to repair methods whose code has
been stolen. To do this, libDexHelper.so
places a hook on
InitializeMethodsCode
:
The prolog of the Instrumentation::InitalizedMethodsCode
method is patched to
redirect the execution flow to a function in libDexHelper.so
that we call
PatchMethodCode
:
A few moments later... we
can deduce the hook's anatomy and the different operations performed by
PatchMethodCode
:
Once the PatchMethodCode
function is called, it first loads the
obfuscated bytecode of the current method using the debug_info_off
as an
identifier with the method index table of the classes.dgc
file. The code is
passed to the function we call here DecryptMethodCode
to be
de-obfuscated. Then code_item
(dex::CodeItem)
of the method (art::Method)
is patched to point to the buffer containing the de-obfuscated bytecode.
This mechanism ensures that the damaged code in each method is repaired before
the method is executed. At this point, the last thing we need to understand is
how bytecode is obfuscated in classes.dgc
. To do this, we need to analyze the
DecryptMethodCode
function.
Bytecode de-obfuscation
The function is rather small, and an analysis of a few basic blocks gives a good idea of how it works:
The function iterates over each opcode. The obfuscated opcodes are XORed with
the low byte of the method's info_debug_off
offset. The result of this
operation is then used as the index of a substitution table. The obfuscated
opcode is replaced by the one obtained from the substitution table:
opcode = S[obfuscated_opcode ^ info_debug_off & 0xff]
Since the substitution table is theoretically a maximum of 256 bytes, one might assume that one of the RC4 KSA previously reversed is reused to generate it, but... no.
The S
substitution table is simply stored in the libDexHelper.so
library
and can be directly extracted from the packed binary. We have everything we
need to fix all the damaged methods and the unpacked DEX can be decompiled
properly:
We are now able to perform static unpacking of the application.
- The method fixing step is implemented in the Dex class of DxFx.
- The bytecode de-obfuscation is located in the MethodCipher class of DxFx.
Conclusion
Through the unfolding of the analysis methodology used to create a static unpacker, we can see the different encryption/obfuscation algorithms used by the packer at different stages. In addition, we were able to highlight an interesting protection mechanism involving bytecode injection and exploiting Android runtime hijacking.