Skip to content

Software Reverse Engineering Introduction

definition

Reverse engineering, also called back engineering, is the process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object; ------ from wikipedia

Software code reversal mainly refers to reverse disassembly and analysis of software structure, process, algorithm, code, etc.

Application area

Mainly used in software maintenance, software cracking, vulnerability mining, malicious code analysis.

Reverse in the CTF competition

> A variety of programming technologies involving Windows, Linux, and Android platforms require reverse analysis of source code and binary files using common tools, mastering the reverse analysis of Android mobile application APK files, mastering encryption and decryption, kernel programming, algorithms, anti-debugging, and Code obfuscation technology. > ------ "National College Student Information Security Competition Entry Guide"

Claim

  • Familiar with related knowledge such as operating system, assembly language, encryption and decryption
  • Experience in programming with a variety of high-level languages
  • Familiar with the compiler principle of multiple compilers
  • Strong program understanding and reverse analysis capabilities

Regular reverse process

  1. Collect information using static analysis tools such as strings/file/binwalk/IDA and perform a google/github search based on these static information.
  2. Study the protection methods of the program, such as code obfuscation, protective shell and anti-debugging techniques, and try to break or bypass protection
  3. Disassemble the target software and quickly locate the key code for analysis
  4. Combine dynamic debugging, verify your initial guess, and clarify the program function during the analysis process.
  5. For the program function, write the corresponding script to solve the flag

Positioning key code tips

Analyze control flow

The control flow can be seen in the Control Flow Chart (CFG) generated by IDA. The disassembly code is read block by block along the branch loop and function call.

  1. Using data, code cross-references

For example, the output prompt string can be found through the data cross-reference to find the corresponding call location, and then find the key code. Code cross-references such as graphical interface programs to get user input, you can use the corresponding windowsAPI function, we can find the key code through these API function call location.

Reverse tips

Coding style

Each programmer's coding style is different. Students who are familiar with the development design pattern can analyze the function module function more quickly.

  1. Principle of concentration

When programmers develop programs, they are often used to write function-related code or data in the same place, and this can be shown in disassembled code, so you can view functions and data near key code during analysis.

  1. Code reuse

Code reuse is very common, and Github, the largest source code repository, is the primary source. In the analysis, you can find some features (such as strings, code styles, etc.) to search on Github, you may find similar code, and recover the missing symbol information during analysis.

  1. Seven points reverse three-point guess

Reasonable guessing can often get twice the result with half the effort. If you encounter a suspicious function but can't see the logic inside, you can guess the function according to the clues and continue to analyze it according to the guess. In the constant guessing, it may help you get closer to the code. The truth.

  1. Distinguishing code

To get the disassembly code, you must be able to distinguish which code is written manually and which is automatically appended by the compiler. In the code written by man, what are the library function codes, which are the code written by the questioner himself, and how is the code of the questioner optimized by the compiler? It is important that we don't have to spend time on code outside of the issuer. If you analyze the half-day in the library function, it will not only experience very bad results, but also have no effect.

  1. Patience

In any case, given enough time, you can always analyze a program thoroughly. But it should not be abandoned too early. I believe that I can definitely break through the problem in the process of twitching and stripping.

Dynamic Analysis

The purpose of dynamic analysis is to locate the key code and verify its inference or understand the program function by outputting information (register, memory change, program output) during the running of the program.

The main methods are: debugging, symbol execution, stain analysis

Algorithm and data structure identification

  • Common algorithm identification

Such as Tea/XTea/XXTea/IDEA/RC4/RC5/RC6/AES/DES/IDEA/MD5/SHA256/SHA1 and other encryption algorithms, large number addition, subtraction, multiplication and division, shortest path and other traditional algorithms

  • Common data structure identification

The identification of advanced data structures such as diagrams, trees, and hash tables in assembly code.

Code obfuscation

For example, using tools such as OLLVM, movfuscator, flower instruction, virtualization and SMC to confuse the code makes program analysis very difficult.

Then there is also anti-aliasing technology, the main purpose is to restore the control flow. Such as simulation execution and symbol execution

Protective shell

There are many types of protective shells, and simple compressed shells can be classified into the following types.

  • unpack -> execute

Extract the program code directly into memory and continue executing the program code.

  • unpack -> execute -> unpack -> execute ...

Unzip part of the code and execute it while decompressing

  • unpack -> [decoder | encoded code] -> decode -> execute

The program code has been coded, and after decompressing, the function is executed to decode the real program code.

There are also related methods for shelling, such as single stepping method, `ESP law', etc.

反Debug

Anti-debugging is intended to prevent the program from being debugged and analyzed by means such as detecting the debugger. For example, use some API functions such as IsDebuggerPresent to detect the debugger, use SEH exception handling, time difference detection and other methods. It can also be protected by overwriting the debug port, self-tuning, and so on.

Unconventional reverse thinking

Unconventional reverse problem design has a wide range of topics and can be any format file of any architecture.

  • lua/python/java/lua-jit/haskell/applescript/js/solidity/webassembly/etc..

  • firmware/raw bin/etc..

  • chip8/avr/clemency/risc-v/etc.

However, the method of reverse engineering is not afraid of these unknown platform formats. In the case of such unconventional problems, we also have some basic processes that can be used universally.

Pre-preparation

  • Read the documentation. The quick way to learn the platform language is to read the official documentation.
  • Official tools. The tools provided or recommended by the government are necessarily the most appropriate tools.
  • Tutorial. On the reverse side, there may be many seniors who wrote reverse tutorials specific to the platform language, so they can quickly absorb this knowledge.

Looking for tools

Mainly look for file parsing tools, disassembler, debugger and decompiler. The disassembler' is required, thedebuggeralso contains the corresponding disassembly function, and for thedecompiler', you have to ask for more blessings, and I am fortunate to lose my life.

Looking for tools to sum up is: Google Dafa is good. Using Google search grammar reasonably, keyword search can help you find the right tool faster and better.


Comments