This post is an introduction to software reverse-engineering (RE).
Software RE Goals
The general goal on software RE is understanding how closed software works and be able to apply modifications. The deliverables of this process may be:
- Documentation explaining what it does and how it works
- Design diagrams
- Source code reimplementation
- Assembly code
It is important to understand that to talk about software RE we must have little or no insight about how it works. For example, if we have already access to the source code, there is no RE involved because we already full insight of its mechanisms. Do not confuse studying or documenting software with software RE.
Situations where RE could arise when dealing with closed source software or when the original source code has been lost.
The source code reimplementation needs to set some sub-goals hitherto:
- Fidelity
- Code openness
- Portability
If the project is a recompilation going to be open source, it may make sense to substitute third-party proprietary libraries with equivalent open source libraries in the recompilation.
If portability is one of the priorities, it makes sense to substitute platform-dependent libraries or resources with equivalent cross-platform libraries.
Software RE Techniques
Techniques to reverse-engineer software:
- Disassembling/decompiling. Obtaining the assembly language or high-level programming language source code from the original binary code.
- Debugging. Use a debugging tool to analyze the binary code.
- Reverse Engineering. By examining the behavior and output of the binary, you might be able to infer parts of the original source code. This process involves analyzing the program’s functionality and trying to reconstruct the logic behind it.
- Data Recovery. In some cases, old development files or remnants of the source code might be present in the binary as strings or symbols. Scouring the binary for such information could lead to partial source code recovery.
- Black box analysis
Disassembling/decompiling Software Binaries
Software binaries is one of the biggest assets for software RE. Unfortunately, it is not always available. When it isn’t, we should look for a different software RE technique.
A disassembler is a computer program that converts machine code into assembly language.
A decompiler is a computer program that converts a machine code into programming language.
An example where software binaries are not available is when we want to RE software related to a website with code run on server side. Another example is when software is run locally but we do not have control over the local system.
On the other hand, we will have access when software is run locally and we have some administration capabilities over the machine, as the system needs to possess the software binaries containing the software machine code in order to run it.
In most cases, there is a 1:1 correspondence between machine code and assembly code, so we can say that if we have the machine code we also have the assembly code.
Machine code and assembly code are dependent on the processor for which the original code has been coded, assembled or compiled. So it is valuable to have a good understanding of the specific assembly language used on the software binaries. You can read this post about assembly code.
On the other, there is no a 1:1 correspondence between assembly code and source code, so it is virtually impossible to retrieve the original source code. The closest we can get is a reimplementation of the original software using new source code.
Different compilers produces different machine languages. So it is valuable to know, when possible and it applies, which compiler has been used.
A decompiler is a program that conversts machine code or assembly language into source code.
You can read this post about disassemblers and decompilers.
Debugging
You can get a list of popular debugging tools on this post.
Data Recovery
You can use the command strings to find ASCII characters within a binary file.
Black Box Analysis
Black box analysis is used just by observing the inputs and corresponding outputs of a system where we have no insight.
Tools for Software Reverse-Engineering
Tools to obtain source code from software binaries
- Disassembles and decompilers
- Debugging tools
Software Reimplementation in different Platforms
There are some specifications depending on the original platform where the binary code was developed.
- MS-DOS
- Windows
MS-DOS
You can read this post about MS-DOS software reimplementation.
Windows
Resource Hacker, also known as ResHack, is a tool to modify aspects of Windows software, and it is used on video game modding.
OpenMW is a reimplementation of “The Elder Scrolls III: Morrowind”.
The Lego Island Decompilation project (2024), that succeeded in creating the source code for Lego Island (1998).
Lego Island Decompilation code repository
DevilutionX is a reimplementation of Diablo (1996), a.k.a. Diablo 1.
It uses a source-viewable license, and more specifically, a Sustainable User License.