This document shows how we wrote the constraints for the OneMax/assembly example
Contents
Since we need to generate assembly programs, it is useful (although not required) to define a few data types. First, the registers:
<item type="constant" name="register">
<value>%eax</value>
<value>%ebx</value>
<value>%ecx</value>
<value>%edx</value>
</item>
then, we can group all instructions in three groups: two operands, one operand, and branches:
<item type="constant" name="instruction">
<value>addl</value>
<value>subl</value>
<value>movl</value>
<value>andl</value>
<value>orl</value>
<value>xorl</value>
<value>cmpl</value>
</item>
<item type="constant" name="instruction_1_op">
<value>incl</value>
<value>decl</value>
<value>notl</value>
</item>
<item type="constant" name="branch">
<value>ja</value>
<value>jz</value>
<value>jnz</value>
<value>je</value>
<value>jne</value>
<value>jc</value>
<value>jnc</value>
<value>jo</value>
<value>jno</value>
</item>
Internal identifiers use all BASE32 symbols, but the assembly language asks that labels start with letter. Thus, we add an "n" in the beginning of all identifiers (eg. "3GE" is trnsformed to "n3GE")
<identifierFormat>n<value /></identifierFormat>
<labelFormat><value/>: </labelFormat>
Since the problem is quite easy, we only need one section containing only one subsection. For the section, the parameter prologueEpilogueCompulsory
makes no real sense: we can be sure that the section won't be empty.
<section id="bitString" prologueEpilogueCompulsory="false">
The section will contain exactly one subsection (min=max=1). maxReferences
makes no sense here, because the subsection cannot be referenced from another subsections (there are no other subsections).
<subSection id="main" maxOccurs="1" minOccurs="1" maxReferences="0">
We want to start with programs containing about 70 macros. The absolute maximum is 1500 and the minimum number of macros is one. Sigma controls the statistical distribution of the lenghts of the programs in the first generation.
<macros maxOccurs="1500" minOccurs="1" averageOccurs="70" sigma="60">
To describe instruction we need few macros. Remember: the opcode can be a variable, as the arguments. Let's start with the classical two-operands instructions. We need one macro to describe the register/register instructions, and one for the instructions/immediate. Here we represent constants as hexadecimal 32-bit values:
<macro id="instrDirectDirect">
<expression>
<param ref="ins"/> <param ref="sreg"/>, <param ref="dreg"/>
</expression>
<parameters>
<item type="definedType" ref="instruction" name="ins" />
<item type="definedType" ref="register" name="sreg" />
<item type="definedType" ref="register" name="dreg" />
</parameters>
</macro>
Instances of the above macro instrDirectDirect
might be:
addl %eax, %ebx
xorl %ecx, %ecx
movl %edx, %eax
...
The macro for instructions containing immediate values will be:
<macro id="instrConstDirect">
<expression>
<param ref="ins"/> $0x<param ref="scon"/>, <param ref="dreg"/>
</expression>
<parameters>
<item type="definedType" ref="instruction" name="ins" />
<item type="bitArray" length="32" base="hex" name="scon" />
<item type="definedType" ref="register" name="dreg" />
</parameters>
</macro>
Some example for its instances:
addl $0xFFF0, %eax
subl $0xA334, %edx
andl $0x0001, %ecx
...
Now, let's write a macros for instructions with one operand:
<macro id="Instr_1_op">
<expression>
<param ref="ins" /> <param ref="reg" />
</expression>
<parameters>
<item type="definedType" ref="instruction_1_op" name="ins" />
<item type="definedType" ref="register" name="reg" />
</parameters>
</macro>
Instances of the macro Instr_1_op
will be:
incl %eax
decl %ebx
notl %edx
...
The macro for branches is pretty straightforward. The references is innerGenericLabel
, that is: any macro in the same subsection. prologue and epilogue included. Please note that this may lead to deadlocks, but such problems are handled by the watchdog in main.
<macro id="branchCond">
<expression>
<param ref="ins"/> <param ref="target"/>
</expression>
<parameters>
<item type="definedType" ref="branch" name="ins" />
<item type="innerGenericLabel" name="target" itself="true" prologue="true" epilogue="true"/>
</parameters>
</macro>
Instances of the macro branchCond
might be:
jnz nO
jne nGF
n32AB: jno n32AB
...
where all the n<code>
are labels that will appear in the same subsection of the individual. In the last case, for example, the jump goes to the same line of the jump instruction, creating a possible loop.
Writing prologue and epilogue is the trickiest part of the operation. Since the evolved function calculates a value, the best way to write your own constraints for an evolved function is to start with a C file, like this skel.c:
int foo(void)
{
return 42;
}
Then, to assemble it using
$ gcc -S skel.c
And, finally, examine the generated assembly file skel.s
.file "skel.c"
.text
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
movl $42, %eax
popl %ebp
.cfi_def_cfa 4, 4
.cfi_restore 5
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1"
.section .note.GNU-stack,"",@progbits
Perhaps you will need to use slightly more complex functions. But, after a few tries, you will be able to recognize a prologue, an epilogue and a variable part. Then, you'll just need to put them in your own constraints.
<prologue id="sectionPrologue">
<expression>
.text
.globl evolved_function
.type evolved_function, @function
evolved_function:
.LFB0:
.cfi_startproc
pushq %rbx
pushq %rcx
pushq %rdx
movl $0x00000505, %eax
movl $0x00050500, %ebx
movl $0x00050500, %ecx
movl $0x05050000, %edx
</expression>
</prologue>
<epilogue id="sectionEpilogue">
<expression>
popq %rdx
popq %rcx
popq %rbx
ret
.cfi_endproc
.LFE0:
.size evolved_function, .-evolved_function
.ident "MicroGP"
.section .note.GNU-stack,"",@progbits
</expression>
</epilogue>
Please note that the prologue also saves all the registers that will be used on the stack, and then initializes them to known values. The former is a safety measure that can avoid some crashes, the latter is (usually) required to have reproducible results.
pushl %ebx
pushl %ecx
pushl %edx
movl $0x00000505, %eax
movl $0x00050500, %ebx
movl $0x00050500, %ecx
movl $0x05050000, %edx
And, accordingly, the epilogue pops back the original values.
popl %edx
popl %ecx
popl %ebx
The ret
opcode returns to the caller. Following the C convention, the returned value is found in AX.
Here, single section and single subsection, there is no need to differentiate between global prologue and prologue, nor between global epilogue and epilogue. Please, remember that in more complex scenarios the distinction may be useful.
This below is a graphical comparison between a slightly altered version of the constraints written above, and a individual generated by the evolutionary algorithm. Each part of the individual (on the right) is highlighted in the same color of the corresponding part in the constraints file (on the left).
Note that each macro can be instanced multiple times inside the same individual. The macro macroBranchCond
(in violet) contains a parameter of type innerLabel
, so it also creates the corresponding labels to jump to inside the individual.