Menu

Register Transfer Language

John Källén

Internally the Decompiler uses its own Register Transfer Language (RTL) in all analyses it performs. It has no knowledge of processor specific machine instructions. It is the task of each processor architecture implementation to provide a suitable [Rewriter] that translates machine instructions to RTL.

RTL consists of distinct instructions and expressions.

RTL Instructions

The instructions are the following:

RtlAssignment: models an assignment. E.g.
eax = eax + 1
Mem0[r1:byte] = 0x20

RtlBranch: models a conditional branch to an address. E.g.
branch Test(NZ, eax) 00402344

RtlCall: models either a direct subroutine call (to a constant address) or an indirect call (to a computed expression:
call 00401580
call Mem0[eax + 00000018:word32]

RtlGoto: models an unconditional direct or indirect branch (like the kind produced by switch statements):
goto 00401890
goto Mem0[r1 + r2 * 4]

RtlIf: models a conditionally executed statement, present in some architectures:
if (r1 > 0) r1 = 0

RtlReturn: models a return to the caller, including how many bytes are removed from a return stack (if applicable)
return (4)

RtlSideEffect: models an instruction that has no observable effect on registers, e.g. the out instruction of the x86 architecture:
__outb(edx,al)

RTL Expressions

All expressions modeled by the Decompiler have a data type. At the very least, the data type will be one of the neutral byte or word<XX> types, whose only attribute is their size in bits.

Base expressions constitute the leaf nodes of expression trees. There are three kinds of base expressions:

Constants: these model constant values, such as booleans, integers, characters or real numbers. Constants may be signed, unsigned (in the case of integers)
false
-1234
3e-3
'c'
Later stages of the decompilation process may produce string constants, which also are modeled by constants.

Addresses are special constants that are known to be pointers to locations . Addresses are especially useful to the Decompiler as it allows it to determine locations referred to by the program. Address must model byzantine addressing schemes such as the infamous x86 segmented addresses, consisting of a segment selector and an offset.
004079A0
0C00:1253

Identifiers model locations accessed by the program. The name of the identifiers are derived from register names, or synthesized from other values such as stack offsets:
r1
dwLoc04
global_00403120
fn04001670

Expressions can be further composed by combinations of base expressions and the following:

Unary operators model negation, bit-wise complement, and other single-operand expressions:
!cx
&dwLoc04

Binary operators model arithmetic operations, logical operations, shift operations, and comparisons:
dwLoc04 + 0x0004
r1 << 0x02
al >= '0'

Memory accesses model loads from and stores to memory. A special version of the instruction models the Intel x86 segmented memory accesses:
Mem0[fp - 0x12] = r10
ax = Mem1[es:bx + 0x04:word16]

Sequences model expressions that occupy consecutive locations in memory. Commonly used when register pairs are used to represent values that are too wide to fit in one register:
dx:ax
es:bx

Casts are used to coerce the data type of an expression to another. This construct is used to type conversion, model sign extension and truncation:
(word16) eax
(int32) 'a'

Applications model calls to functions:
fn0124_0123(ecx)


Related

Wiki: Rewriter