In security testing, reverse engineering is the process of analyzing the software application to determine its functional characteristics, internal architecture and, eventually, its functionality: modules, functions and algorithms. Reverse engineering is used for different purposes:
- Improving the functionality of the application when the software company that developed the app no longer exists, or there is no way to contact the developer.
- Analysis of worms, Trojans and viruses to highlight their signatures and create remedies (anti-virus software).
- Transcript file formats for better compatibility (file formats of popular paid applications for Windows without Linux analogues – Open Office or Gimp, for example).
- Education and much more.
Both mobile and PC applications could be goals of attackers. In the context of reverse engineering, it does not matter whether the application is installed on a smartphone or a personal computer, because hacking techniques depend largely on the programming language and implemented protection mechanisms.
After all, taking a closer look, mobile applications essentially become archives that consists of configuration files, libraries and compiled programming code. Therefore, general approaches to “break in” to mobile and desktop applications are identical.
However, reverse engineering is often used for other purposes as well. Once one has studied the architecture of the application or obtained the source (initial) code, he/she can change it and use it for his/her own purposes – not always backed by good intent. For example:
- Endless use of applications trial versions. Let’s imagine we have a software product that is free to use for a month or so. When the application starts working, it checks the date on the current installation. By removing this check or replacing it with the function that will always return the necessary result, the application will remain in the mode of trial forever.
- Information or code stealing. The attacker’s goal may not be the app on the whole, but its module or some part. This tactic is relevant for competing companies engaged in software development.
- Avoiding copyrights. The hacker’s purpose here is to remove the copy protection of audio and video files, computer games or e-books for later free distribution.
The process of obtaining the source code depends on thWe1e programming language and platform, as is the process of reverse compilation. For example, applications developed in the .Net framework are first compiled into an intermediate language (Common Intermediate Language (CIL)) and then converted into machine code by a Common Language Runtime (CLR) during execution.
Similarly, the compilation of Java and Python applications works as follows: high-level code is first compiled into an intermediate low-level language (byte code) and then converted to machine code by a “just-in-time” compiler.
Such organization provides a cross-platformity and allows writing of different parts of the application in different laWe`llnguages within a single framework. However, considering reverse engineering, intermediate language (such as CIL and the byte code) is able to provide information about the classes, structures, interfaces, etc., and restore the original architecture. For this reason, there are some ready-to-use utilities such as .Net Reflector, MSIL Disassembler, ILSpy and dotPeek for .Net applications; Javap, JAD and DJ for Java restoration of the byte code; and pyREtic, pycdc and Uncompyle2 to work with Python applications.
If an attacker is sufficiently familiar with the CIL or byte codprovide you e, then sooner or later he/she will be able to make changes, recompile and force the application to work for his/her own purposes.
Reverse engineering of applications on traditional programming languages (such as C, C ++ or Objective-C) is more challenging. Applications written on those languages are compiled directly into executable machine code and do not keep any information about the structure of the original application: class names, function names or variables, etc.
An additional barrier is that low level language used in such applications does not contain branching structure (if, for, etc.), and its restoration requires the re-creation of the “ruling tree” (i.e. list of application managing constructions).
This requires considerable time; though, this alone cannot guarantee the safety of the application’s source code. Having deep knowledge in assembler and programming skills, the task of rebuilding the source code (or its identical in functionality) becomes only a matter of time.
Knowing all of this, how can the application be protected? At the very least, how can one discourage the attacker from completing his/her task? Below are some suggestions:
- Code obfuscation –the process of bringing the code to “hard to analyze” mode while keeping its functionality. Obfuscation significantly complicates the process of reverse engineering, so even if the attacker obtains the source code, it will be extremely difficult to determine that particular code’s function.
Mutation can be considered one of the most effective types of obfuscation. This means that the application is constantly changing its source code at runtime, which makes the task of reverse engineering extremely difficult.
However, this method has its own problems. Obfuscated code becomes “unreadable” not only for the attacker, but also for the developer. Also, adding some extra code branches can reduce performance and even add defects to the code. Perhaps the biggest issue, however, is that obfuscation does not guarantee high safety in cases where the criminal gets the source code, even if it is difficult to understand. After all, the target in this case is a particular area of the code.
- Integrity Check – confirmation that the code has not been modified. For this, checksums of different code segments are calculated, and in the case of a discrepancy with the preset value, the application ceases to operate. However, if an attacker gains access to the application source code, he/she can remove an integrity check or replace it with the function always displayimg the desired result.
- Programming code encryption – verification that only “legal” consumers are able to use the application. Without the encryption key, the app becomes inoperative or functions only in its trial version. Meanwhile, nothing can guarantee the safety of the code since the offender is able to disclose the mechanism of keys generation.
There are some other methods of protection (watermarks, the imposition of critical sections of code in separate modules, secure execution environments, etc.). However, none of these options can provide complete safety. The benefits of each application protecting approach must be considered for each unique case. For example, code obfuscation is, in fact, not only a means of protection, but in certain cases may increase performance.
Therefore, choosing methods of code protection, first you must consider the threat model – namely, what in the application needs to be protected and in what ways can an attacker most effectively try to get it. If an attacker strives to change the code, and thus, get control over the application, then the best response is an integrity check. If, however, we are considering an application fragment as an object of the attack, then it is worth considering obfuscation or encryption as an option.