Menu

Loader

John Källén

Loader design

The Loader is the first part of the decompilation pipeline. It is responsible for fetching the bytes from the file that is to be decompiled, determining what executable format the file is in, and depending on what format, delegating off to an appropriate [ImageLoader].

Magic cookies

To determine what kind of executable format a file is in, the Loader uses a pattern-matching scheme. Strings of hexadecimal characters are used to represent magic cookies used in the different executable file formats (e.g. 4D 5A for "MZ" executables, 7F 45 4C 4F for ELF executables). By default, the file is searched for a magic cookie at the beginning of the image (offset 0), but if the magic cookie is located at a different offset, this can be specified too.

Associated with each magic cookie is the .NET type name of an [ImageLoader]. These associations are stored in the Decompiler.config file, which makes it easy to add new ImageLoaders as the need arises. At load time, the Loader creates an instance of the appropriate ImageLoader based on its magic cookie and passes control to it. The end result is a loaded [Program], which is consumed by the next stage in the pipeline.

Unpacking packed images

Sometimes, executable images are post-processed by their developers with so-called packers. This is done for a variety of reasons. For instance, early MS-DOS packers such as EXEPACK were designed to make the disk footprint of an executable image smaller. This was done to conserve space on distribution media (remember 5" floppy disks?) and to speed up loading (floppy disk I/O was much slower than even an i8086 processor running at 4.77 MHz). More recently packers have been used by malware authors to obfuscate their eldrich creations from virus scanners.

To detect whether an image has been packed, the Loader again uses a pattern-matching scheme. Each packer is represented by a signature of hexadecimal characters that identify that particular packer. These signatures are collected in large signature files, distributed with the Decompiler. When loading an executable image, the Loader attempts to match the image bytes with a known signature. If one is found, the presence of a packer has been ascertained.

If a packer has been detected, the Loader needs to unpack the image before it can proceed. Otherwise, all it will do is decompile the packing code, which often is of little interest to the users. The Loader consults the Decompiler.config file to see if an unpacker is available for the detected packer. Such unpackers are just another form of ImageLoader from the perspective of the ImageLoader; at the end of the unpacking operation, the loader again returns a Program to subsequent decompiler stages.

Two strategies for unpacking are implemented currently. Some unpackers have been created by painstakingly reverse engineering the unpacking code found in packed executables. However, a fruitful alternative is to emulate the unpacking machine code present in the packed executable; this lets the unpacking perform the unpacking itself. To do this, the Loader needs a way to control the emulation so that it is stopped once the unpacker is done with its work, but before the execution of the real program actually starts.

To support this emulation of unpacking code, the Loader uses a special ImageLoader that understands a scripting language that controls the emulation by setting breakpoints at appropriate places, searching for binary patterns, and so on. A popular scripting language is OdbgScript, and support for this language is included in the Decompiler.

Archives

The Loader has support for archives, that is files that contain other files inside them. For instance, Commodore C64 disk images are popularly distributed as .d64 files. These image files contain a directory of files and the file images themselves. Because an archive may contain more than one executable, the Loader needs "oracular" help from users to decide which one of the files in the archive is to be decompiled. Users are presented with a picker which allows them to select the particular file of interest.