MicroGP Wiki

A multi-purpose extensible self-adaptive evolutionary algorithm

Brought to you by: albertotonda, ernesto_sanchez, squillero

This project can now be found here.

How to Write Constraints

Authors:

An example of writing constraints: assembly programs

This document shows how we wrote the constraints for the OneMax/assembly example
Contents

Type definitions

Since we need to generate assembly programs, it is useful (although not required) to define a few data types. First, the registers:

<item type="constant" name="register">
     <value>%eax</value>
     <value>%ebx</value>
     <value>%ecx</value>
     <value>%edx</value>
</item>

then, we can group all instructions in three groups: two operands, one operand, and branches:

<item type="constant" name="instruction">
     <value>addl</value>
     <value>subl</value>
     <value>movl</value>
     <value>andl</value>
     <value>orl</value>
     <value>xorl</value>
     <value>cmpl</value>
</item>
<item type="constant" name="instruction_1_op">
     <value>incl</value>
     <value>decl</value>
     <value>notl</value>
</item>
<item type="constant" name="branch">
     <value>ja</value>
     <value>jz</value>
     <value>jnz</value>
     <value>je</value>
     <value>jne</value>
     <value>jc</value>
     <value>jnc</value>
     <value>jo</value>
     <value>jno</value>
</item>

Formats

Internal identifiers use all BASE32 symbols, but the assembly language asks that labels start with letter. Thus, we add an "n" in the beginning of all identifiers (eg. "3GE" is trnsformed to "n3GE")

<identifierFormat>n<value /></identifierFormat>
<labelFormat><value/>: </labelFormat>

Sections and Subsections

Since the problem is quite easy, we only need one section containing only one subsection. For the section, the parameter prologueEpilogueCompulsory makes no real sense: we can be sure that the section won't be empty.

<section id="bitString" prologueEpilogueCompulsory="false">

The section will contain exactly one subsection (min=max=1). maxReferences makes no sense here, because the subsection cannot be referenced from another subsections (there are no other subsections).

<subSection id="main" maxOccurs="1" minOccurs="1" maxReferences="0">

Macros

We want to start with programs containing about 70 macros. The absolute maximum is 1500 and the minimum number of macros is one. Sigma controls the statistical distribution of the lenghts of the programs in the first generation.

<macros maxOccurs="1500" minOccurs="1" averageOccurs="70" sigma="60">

To describe instruction we need few macros. Remember: the opcode can be a variable, as the arguments. Let's start with the classical two-operands instructions. We need one macro to describe the register/register instructions, and one for the instructions/immediate. Here we represent constants as hexadecimal 32-bit values:

<macro id="instrDirectDirect">
    <expression> 
        <param ref="ins"/>    <param ref="sreg"/>,  <param ref="dreg"/>
    </expression>
    <parameters>
        <item type="definedType" ref="instruction" name="ins" />
        <item type="definedType" ref="register" name="sreg" />
        <item type="definedType" ref="register" name="dreg" />
    </parameters>
</macro>

Instances of the above macro instrDirectDirect might be:

addl    %eax,  %ebx
xorl    %ecx,  %ecx
movl    %edx,  %eax
...

The macro for instructions containing immediate values will be:

<macro id="instrConstDirect">
    <expression> 
        <param ref="ins"/>     $0x<param ref="scon"/>,   <param ref="dreg"/> 
    </expression>
    <parameters>
        <item type="definedType" ref="instruction" name="ins" />
        <item type="bitArray" length="32" base="hex" name="scon" />
        <item type="definedType" ref="register" name="dreg" />
    </parameters>
</macro>

Some example for its instances:

addl    $0xFFF0,   %eax
subl    $0xA334,   %edx
andl    $0x0001,   %ecx
...

Now, let's write a macros for instructions with one operand:

<macro id="Instr_1_op">
    <expression>
        <param ref="ins" />   <param ref="reg" />
    </expression>
    <parameters>
        <item type="definedType" ref="instruction_1_op" name="ins" />
        <item type="definedType" ref="register" name="reg" />
    </parameters>
</macro>

Instances of the macro Instr_1_op will be:

incl   %eax
decl   %ebx
notl   %edx
...

The macro for branches is pretty straightforward. The references is innerGenericLabel, that is: any macro in the same subsection. prologue and epilogue included. Please note that this may lead to deadlocks, but such problems are handled by the watchdog in main.

<macro id="branchCond">
    <expression> 
        <param ref="ins"/>   <param ref="target"/>
    </expression>
    <parameters>
        <item type="definedType" ref="branch" name="ins" />
        <item type="innerGenericLabel" name="target" itself="true" prologue="true" epilogue="true"/>
    </parameters>
</macro>

Instances of the macro branchCond might be:

jnz   nO
jne   nGF
n32AB: jno   n32AB
...

where all the n<code> are labels that will appear in the same subsection of the individual. In the last case, for example, the jump goes to the same line of the jump instruction, creating a possible loop.

Prologue/Epilogue

Writing prologue and epilogue is the trickiest part of the operation. Since the evolved function calculates a value, the best way to write your own constraints for an evolved function is to start with a C file, like this skel.c:

int foo(void)
{
    return 42;
}

Then, to assemble it using

$ gcc -S skel.c

And, finally, examine the generated assembly file skel.s

        .file "skel.c"
        .text
        .globl  foo
        .type   foo, @function
foo:
.LFB0:
        .cfi_startproc
        pushl   %ebp
        .cfi_def_cfa_offset 8
        .cfi_offset 5, -8
        movl    %esp, %ebp
        .cfi_def_cfa_register 5
        movl    $42, %eax
        popl    %ebp
        .cfi_def_cfa 4, 4
        .cfi_restore 5
        ret
        .cfi_endproc
.LFE0:
        .size   foo, .-foo
        .ident  "GCC: (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1"
        .section    .note.GNU-stack,"",@progbits

Perhaps you will need to use slightly more complex functions. But, after a few tries, you will be able to recognize a prologue, an epilogue and a variable part. Then, you'll just need to put them in your own constraints.

<prologue id="sectionPrologue">
    <expression>
        .text
        .globl evolved_function
        .type evolved_function, @function
evolved_function:
.LFB0:
        .cfi_startproc
        pushq %rbx
        pushq %rcx
        pushq %rdx
        movl $0x00000505, %eax
        movl $0x00050500, %ebx
        movl $0x00050500, %ecx
        movl $0x05050000, %edx
    </expression>
</prologue>
<epilogue id="sectionEpilogue">
    <expression>
        popq %rdx
        popq %rcx
        popq %rbx
        ret
        .cfi_endproc
.LFE0:
        .size evolved_function, .-evolved_function
        .ident "MicroGP"
        .section .note.GNU-stack,"",@progbits
     </expression>
</epilogue>

Please note that the prologue also saves all the registers that will be used on the stack, and then initializes them to known values. The former is a safety measure that can avoid some crashes, the latter is (usually) required to have reproducible results.

pushl %ebx
pushl %ecx
pushl %edx
movl $0x00000505, %eax
movl $0x00050500, %ebx
movl $0x00050500, %ecx
movl $0x05050000, %edx
And, accordingly, the epilogue pops back the original values.
popl %edx
popl %ecx
popl %ebx

The ret opcode returns to the caller. Following the C convention, the returned value is found in AX.

Here, single section and single subsection, there is no need to differentiate between global prologue and prologue, nor between global epilogue and epilogue. Please, remember that in more complex scenarios the distinction may be useful.

Graphical comparison constraints / individual

This below is a graphical comparison between a slightly altered version of the constraints written above, and a individual generated by the evolutionary algorithm. Each part of the individual (on the right) is highlighted in the same color of the corresponding part in the constraints file (on the left).

Note that each macro can be instanced multiple times inside the same individual. The macro macroBranchCond (in violet) contains a parameter of type innerLabel, so it also creates the corresponding labels to jump to inside the individual.

Wiki: Home
Wiki: OneMax in Assembly