Android Application Diffing: Analysis of Modded Version

Posted Thu 16 May 2019
Authors Tom Czayka, Romain Thomas
Category Android
Tags Android, binary diffing, tool, 2019

This blog post is about detecting modifications between genuine and repackaged applications.

The first article of this blog series enlightened the internals of the diff engine. Next, through the analysis of CVE-2019-10875, the second part pointed out how such analysis can be used for spotting mutations related to a security patch within code. Finally, this third part exposes another concrete use case in which we combine diff analysis and Redex to examine a modded version of a well-known music streaming application.

Introduction

First, let's explain what a modded application is. 'Mod' means 'modified'. In other words, a modded application is a modified version of a genuine application which has been altered in order to add or remove some features. For instance, lots of applications that embed advertising are likely to be modified to fully disable it. As well, it is quite common that modded applications unlock paid features for free or lower price. However, in this blog post, we stick to the former case: a modded music streaming application that removes ads from the original one.

Those alterations can be done through repackaging. It means that developers (also called modders) unpack the genuine application and inject, delete or modify some pieces of code. This stage is usually performed at the smali representation level but it could also be done at the native level [1]. After modifying code, modders are able to repack the application hence generate a brand-new APK which looks like the original application but with their freshly-altered Dalvik bytecode in. This technique is also commonly used in the malware field as an adversary can easily insert malicious code within a well-known application [2] and release it just as the real one — that is, users are not distrustful of it. The most popular tool to do so is presumably apktool.

Moreover, modders may sometimes append an extra layer to this model to protect the modifications from reverse-engineering.

Defeating obfuscation

First, we naively compared the original and modded applications with the diff engine in the same way as the previous part. Doing so, it results in a large number of matching results with slight modifications (matching distances greater than 95%). After having a look at a randomly-selected class among matching results, we notice that some pointless instructions have been added in plenty of methods at Dalvik bytecode level. It clearly shows the presence of intentional obfuscation. As a consequence, it makes actual modifications much harder to find. Our approach of considering Dalvik bytecode makes the output full of false positive results. The following block emphasises divergences between the same method from both the original and the modded applications:

--- com.XXXXXXX.app-genuine/com/XXXXXXX/app/feature/ad/model/AudioAd.smali
+++ com.XXXXXXX.app-modded/com/XXXXXXX/app/feature/ad/model/AudioAd.smali
  .method public getArtworkUrl()Ljava/lang/String;
-     .locals 3
+     .locals 4
      .annotation build Landroid/support/annotation/Nullable;
      .end annotation

-     .line 98
+     const/4 v3, 0x3
+
      iget-object v0, p0, Lcom/XXXXXXX/app/feature/ad/model/AudioAd;->mCoverUrl:Ljava/lang/String;

+     const/4 v3, 0x6
+
      if-eqz v0, :cond_0

+     const/4 v3, 0x5
+
      iget-object v0, p0, Lcom/XXXXXXX/app/feature/ad/model/AudioAd;->mCoverUrl:Ljava/lang/String;

+     const/4 v3, 0x7
+
+     const-string v1, "igf"
+
      const-string v1, "gif"

+     const/4 v3, 0x5
+
      invoke-virtual {v0, v1}, Ljava/lang/String;->endsWith(Ljava/lang/String;)Z

      move-result v0

+     const/4 v3, 0x1
+
      if-eqz v0, :cond_0

+     const/4 v3, 0x5
+
      const/4 v0, 0x1

-     .line 99
+     const/4 v3, 0x2
+
      new-array v0, v0, [Ljava/lang/Object;

      const/4 v1, 0x0

+     const/4 v3, 0x5
+
      iget-object v2, p0, Lcom/XXXXXXX/app/feature/ad/model/AudioAd;->mCoverUrl:Ljava/lang/String;

      aput-object v2, v0, v1

+     const/4 v3, 0x4
+
      const/4 v0, 0x0

      return-object v0

-     .line 102
      :cond_0
+     const/4 v3, 0x0
+
      iget-object v0, p0, Lcom/XXXXXXX/app/feature/ad/model/AudioAd;->mCoverUrl:Ljava/lang/String;

+     const/4 v3, 0x1
+
      return-object v0
  .end method

Through this example, we can indeed observe that all the added instructions are useless regarding behaviour during the execution. They only aim to conceal actual modifications from reverse-engineers who perform static analysis on the application. Therefore, we need to remove dead instructions beforehand. We can also notice that the obfuscator has not changed the structure and the class hierarchy but only the bytecode.

It is where Redex comes in. This handy open-source tool is an Android bytecode optimizer developed by Facebook Inc [3]. It provides a framework to deal with DEX files and perform various actions on them. Redex takes the Dalvik bytecode as input, applies optimization passes and produces an optimized Dalvik bytecode. Figure 1 illustrates how it works.

Figure 1. Redex workflow process.

It also provides a control-flow graph representation of Dalvik methods which is quite powerful and efficient. Furthermore, it brings along a command-line interface which takes an APK as input and generates another APK as output. Depending on what kind of optimization we want to apply, we can configure various optimization passes [4] such as RemoveUnreachablePass which eliminates unreachable pieces of code. Those passes are able to modify bytecode according to their purposes. For instance, as the name suggests, the pass called RemoveUnusedArgsPass aims to clean bytecode up by removing unused arguments.

Besides, Redex also provides a pass called LocalDcePass which stands for Local Dead Code Elimination. In our case, it is quite interesting because pointless instructions are basically considered as dead code thus Redex can help us remove them and produce a sanitised version of the modded application. In other words, taking advantage of Redex allows us to normalise the applications prior to analysis. We used the following simple configuration file. Note that RegAllocPass is a required pass.

{
  "redex" : {
    "passes" : [
      "LocalDcePass",
      "RegAllocPass"
    ]
  }
}

Passing both the original and the modified applications through Redex, we are able to get a normalised APK for each version. Looking again into the previously-exposed getArtworkUrl() method, all extra instructions went away. Now, they look alike at the smali representation level. We have managed to defeat the obfuscation layer. Therefore, we are able to re-run the diff process on those normalised APKs.

.method public getArtworkUrl()Ljava/lang/String;
    .locals 3
    .annotation build Landroid/support/annotation/Nullable;
    .end annotation

    iget-object v0, p0, Lcom/XXXXXXX/app/feature/ad/audio/model/AudioAd;->mCoverUrl:Ljava/lang/String;

    if-eqz v0, :cond_0

    iget-object v1, p0, Lcom/XXXXXXX/app/feature/ad/audio/model/AudioAd;->mCoverUrl:Ljava/lang/String;

    const-string v0, "gif"

    invoke-virtual {v1, v0}, Ljava/lang/String;->endsWith(Ljava/lang/String;)Z

    ...

Spotting actual mutations

The procedure is roughly the same as the one exposed in the second blog post. At first, we have to find the development package in order to make the sets of classes as small as possible. However, it turns out that problems may happen at this step because it is quite usual that alterations have been done in some external SDK and not on the genuine application code itself. At this point, we do not have any information about where the modifications could be found at. It is the reason why performances matter. Even though we are comparing a large number of classes, computation time has to be reasonable.

As this application is not very large in terms of embedded classes, let's compare all the classes (about 20400 on each side) regardless the package they are located in — that is, skipping the filtering stage. The diff process takes about 1min47 for similarity computation and outputs:

...
[+] com/adserver/library/mediation: ASAppLovinAdapter | com/adserver/library/mediation: ASAppLovinAdapter -> 0.9973
[+] com/adserver/library/mediation: ASVungleAdapter$4 | com/adserver/library/mediation: ASVungleAdapter$4 -> 0.9960
[+] com/adserver/library/mediation: ASMediationAdManager$1 | com/adserver/library/mediation: ASMediationAdManager$1 -> 0.9441
[+] com/adserver/library/mediation: ASAdColonyAdapter$1 | com/adserver/library/mediation: ASAdColonyAdapter$1 -> 0.9896
[+] com/adserver/library/mediation: ASAdMobAdapter | com/adserver/library/mediation: ASAdMobAdapter -> 0.9988
[+] com/adserver/library/controller/mraid: ASMRAIDVideoController | com/adserver/library/controller/mraid: ASMRAIDVideoController -> 0.9963
[+] com/adserver/library/controller: ASAdViewController$ProxyHandler | com/adserver/library/controller: ASAdViewController$ProxyHandler -> 0.9996
[+] com/adserver/library/controller: ASAdViewController | com/adserver/library/controller: ASAdViewController -> 0.9714
[+] com/adserver/library: ASInterstitialView | com/adserver/library: ASInterstitialView -> 0.9656
[+] com/google/android/gms/internal/measurement: zzkd | com/google/android/gms/internal/measurement: zzkd -> 0.9967
[+] com/google/android/gms/internal/measurement: zzfm | com/google/android/gms/internal/measurement: zzfm -> 0.9969
[+] com/google/android/gms/internal/ads: zzasv | com/google/android/gms/internal/ads: zzasv -> 0.9991
[+] com/google/android/gms/internal/ads: zzyk | com/google/android/gms/internal/ads: zzyk -> 0.9925
[+] com/google/android/gms/internal/ads: zzpn | com/google/android/gms/internal/ads: zzpn -> 0.9912
[+] com/google/android/gms/internal/ads: zzald | com/google/android/gms/internal/ads: zzald -> 0.9980
[+] com/google/android/gms/internal/ads: zzapi | com/google/android/gms/internal/ads: zzapi -> 0.9895
[+] com/google/android/gms/internal/ads: zzarh | com/google/android/gms/internal/ads: zzarh -> 0.9925
[+] com/google/android/gms/internal/ads: zzass | com/google/android/gms/internal/ads: zzass -> 0.9674
[+] com/google/android/gms/internal/ads: zzom | com/google/android/gms/internal/ads: zzom -> 0.9718
[+] com/google/android/gms/ads/internal/overlay: zzo | com/google/android/gms/ads/internal/overlay: zzo -> 0.9417
[+] com/google/android/gms/ads/internal/overlay: zzd | com/google/android/gms/ads/internal/overlay: zzd -> 0.9820
[+] com/google/android/gms/common: GooglePlayServicesUtil | com/google/android/gms/common: GooglePlayServicesUtil -> 0.9945
[+] com/google/android/gms/common: GooglePlayServicesUtilLight | com/google/android/gms/common: GooglePlayServicesUtilLight -> 0.9581
...

Note that results have been willingly truncated because a bunch of classes have been altered. At first sight, it gives us a quick overview of where the modifications are — they mostly appear in packages called com.adserver.android.library and com.google.android.gms. In this article, we only focus on inspecting a specific piece of code as full analysis is not the purpose. Let's then look into the private final b(Z)V method of zzd class.

--- com.XXXXXXX.app-genuine/com/google/android/gms/ads/internal/overlay/zzd.smali
+++ com.XXXXXXX.app-modded/com/google/android/gms/ads/internal/overlay/zzd.smali
     iget-object v2, v1, Lcom/google/android/gms/ads/internal/overlay/zzd;->b:Lcom/google/android/gms/ads/internal/overlay/AdOverlayInfoParcel;

     iget-object v2, v2, Lcom/google/android/gms/ads/internal/overlay/AdOverlayInfoParcel;->l:Ljava/lang/String;

     if-eqz v2, :cond_11

-    iget-object v3, v1, Lcom/google/android/gms/ads/internal/overlay/zzd;->c:Lcom/google/android/gms/internal/ads/zzaqw;
-
-    iget-object v2, v1, Lcom/google/android/gms/ads/internal/overlay/zzd;->b:Lcom/google/android/gms/ads/internal/overlay/AdOverlayInfoParcel;
-
-    iget-object v2, v2, Lcom/google/android/gms/ads/internal/overlay/AdOverlayInfoParcel;->l:Ljava/lang/String;
-
-    invoke-interface {v3, v2}, Lcom/google/android/gms/internal/ads/zzaqw;->loadUrl(Ljava/lang/String;)V
+    invoke-static {}, Lcom/PinkiePie;->DianePie()V

     :goto_b
     iget-object v2, v1, Lcom/google/android/gms/ads/internal/overlay/zzd;->b:Lcom/google/android/gms/ads/internal/overlay/AdOverlayInfoParcel;

     iget-object v2, v2, Lcom/google/android/gms/ads/internal/overlay/AdOverlayInfoParcel;->d:Lcom/google/android/gms/internal/ads/zzaqw;

This modification basically overrides the initial call to loadUrl() method with another call to the static method named DianePie(). The PinkiePie class which carries this method was not present in the original version thus it has been added in between. Having a look at its implementation, code is empty, it just does nothing. As a result, it acts like it removes the loadUrl() call. As the method's name indicates, it means that advertising remote resources will not be accessed on the modded version.

Conclusion

This blog post series aimed to give an overview of the theory behind Android application diffing as well as real use cases [5]. Nonetheless, a few downsides still remain — our tool generates false positive/negative results when coming across some specific configurations. It often faces such issues when dealing with a bunch of small classes which all look alike at structural level and do not contain much code.

Going further, it would be also great to consider inheritance relationship. Some recent research projects already do so such as LibPecker [6]. It would allow us to keep context information of the class to avoid several false results. Nevertheless, as complexity is likely to increase quickly, it would only select first-level relative classes in a first place — that is, not recursively browsing through the whole inheritance tree. Otherwise, it would impact performances as well.

Acknowlegments

Thanks to Quarkslab colleagues for proofreading these articles and all the people who have supported this project.

References

[1]	Though, it is a bit more tricky.

[2]	https://www.lookout.com/info/ds-dark-caracal-ty

[3]	https://github.com/facebook/redex

[4]	https://github.com/facebook/redex/tree/master/opt

[5]	There is no plan for tool release.

[6]	https://yuanxzhang.github.io/paper/libpecker-saner2018.pdf

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

Table of contents