4tH compiler Wiki

A Forth compiler with a little difference

Brought to you by: thebeez

Optimization

The best way to optimize a program is to look out for certain patterns and obliterate them where possible, e.g.:

swap 2drop

Is a useless sequence of words that could easily be replaced by a single ”2DROP”. You don't need an optimizer to see that one. 4tH does very little optimization itself. One thing 4tH does for you is tail call optimization. That means that if the last instruction before the semicolon is a call to another word, it will change the 'CALL' instruction to a 'BRANCH' instruction. The advantage is twofold:

It will save you an expensive 'CALL' - 'EXIT' sequence;
It will save you Return Stack space.

Consequently, your programs will run a little bit faster. If you want any further improvement, you will have to do that yourself. E.g. this will optimize a tail call optimization even more. Simply change:

: myfirstword ;
: mysecondword ;
: mythirdword if myfirstword else mysecondword then ;

Into this:

: myfirstword ;
: mysecondword ;
: mythirdword if myfirstword ;then mysecondword ;

If we decompile these programs we see the difference right away. The first compiles to:

[   0] branch    (1)
[   1] exit      (0)
[   2] branch    (3)
[   3] exit      (0)
[   4] branch    (9)
[   5] 0branch   (7)
[   6] call      (0)
[   7] branch    (8)
[   8] branch    (2)
[   9] exit      (0)

You can clearly see the tail call optimization at word 8. However, there is a 'BRANCH' at word 7 and an 'EXIT' at word 9 we can do without. The second version fixes this:

[   0] branch    (1)
[   1] exit      (0)
[   2] branch    (3)
[   3] exit      (0)
[   4] branch    (7)
[   5] 0branch   (6)
[   6] branch    (0)
[   7] branch    (2)

After the call at word 6 we immediately branch, which is correct because there is no code to execute after that. Note the tail optimizer was able to kick in twice here. Note that the ;THEN instruction provides the required ”expressiveness” of this subroutine, so its completely clear why this construction was chosen.

In some rare circumstances, e.g. when you're calling user-defined words which manipulate the return stack, you don't want the tail optimizer to kick in. Suppressing it is very simple, just add the '[FORCE]' directive to the 'EXIT' or
change the order of words slightly so that a reserved word is compiled at the end:

: return r> drop ;
: test dup if 1+ else return then [force] ;

0 test . cr

Another area of optimization is the calculation of constants. If we do it at compile time, we do not have to do it at runtime. Especially when it is in the middle of a loop or a word we frequently use. 4tH features a small peephole optimizer that tries to make the best of it. The rules are simple:

If you compile the word '+' or '-' and the previously compiled word was a LITERAL, a +LITERAL will be compiled unless rule 6 can be applied;
If you compile the word '*' and the previously compiled word was a LITERAL, a *LITERAL will be compiled unless rule 7 can be applied;
If you compile the word '/' and the previously compiled word was a LITERAL, a /LITERAL will be compiled unless rule 8 can be applied;
If you compile the word 'NEGATE' and the previously compiled word was a LITERAL, *LITERAL or /LITERAL, it will be negated;
If you compile the word '@' or '!' and the previously compiled word was a VARIABLE, a VALUE or TO will be compiled;
If you compile a +LITERAL and the previously compiled word was a LITERAL, +LITERAL or VARIABLE, the +LITERAL will added to it;
If you compile a *LITERAL and the previously compiled word was a LITERAL or *LITERAL, it will be multiplied;
If you compile a /LITERAL and the previously compiled word was a LITERAL it will be divided. If it was a /LITERAL, it will be multiplied;
If you compile a +LITERAL with the value 0 or a /LITERAL or *LITERAL with value 1, nothing is compiled at all;
If you compile a *LITERAL and the previously compiled word was a NEGATE, it will be replaced by a *LITERAL with a negative argument;
If you compile a /LITERAL and the previously compiled word was a NEGATE, it will be replaced by a /LITERAL with a negative argument;
If you compile a +LOOP with the value 1, LOOP is compiled;
If you compile a >R, followed by R> both instructions are optimized away. The same goes for the sequence R> >R;
If you compile a R>, followed by DROP an RDROP will be compiled;
If you compile a >R, followed by RDROP a DROP will be compiled;
If a 'THEN' or 'BEGIN' is compiled, the peephole optimizer is disabled and previously compiled code will not be further optimized.

Let's see how that works in practice and examine this simple program:

-10 +constant 10-
100 begin 10- dup while 5 + dup . repeat

First we create a '+CONSTANT', which subtracts 10 from any number on the stack. Second, we set up a loop which starts at 100 and is subsequently decremented until it hits zero. 4tH will compile this code for you:

[   0] literal   (100)
[   1] +literal  (-10)
[   2] dup       (0)
[   3] 0branch   (7)
[   4] +literal  (5)
[   5] dup       (0)
[   6] .         (0)
[   7] branch    (0)

Note that although rule 2 seems to apply to the first two instructions, it is overruled by rule 16. With reason, because
otherwise the LITERAL ”90” would have been compiled which is certainly not what we meant. You can clearly see that the expression ”5 +” has been condensed to a single +LITERAL, which saves an instruction. Now let's change it slightly and see what the peephole optimizer does:

-10 +constant 10- 
100 begin 10- dup while 5 + 10- dup . repeat

We've added the '+CONSTANT' to the already optimized expression. This is what 4tH compiles:

[   0] literal   (100)
[   1] +literal  (-10)
[   2] dup       (0)
[   3] 0branch   (7)
[   4] +literal  (-5)
[   5] dup       (0)
[   6] .         (0)
[   7] branch    (0)

At first it seems like it's identical, but it's not. The '+CONSTANT' has simply been subtracted from the expression! 4tH's peephole optimizer is not there to clean up your messy code, but to give you a helping hand where you really need it, e.g. consider this code:

struct
  1 +field operator
  2 +field operand
end-struct /instruction       \ define a simple structure
                              \ allocate some space
/instruction string instruction
                              \ now initialize it
1 instruction -> operator c!
5 instruction -> operand  c!

Which 4tH compiles to:

[   0] literal   (1)
[   1] literal   (768)
[   2] c!        (0)
[   3] literal   (5)
[   4] literal   (769)
[   5] c!        (0)

Without the peephole optimizer the addresses of the members of the structure would not have been calculated at compile time. Even more, optimizing this code by hand would not have been easy without writing some pretty murky source code. It is in these situations that the peephole optimizer excels. But the optimizer knows even more tricks:

Dead code elimination: It removes sequences like R> >R. You might not write those yourself, but they may be introduced by inline macros;
Algebraic simplification: Operations like ”adding zero” or ”multiplying by one” are removed;
Strength reduction: Replacing more ”expensive” operations like R> DROP with ”cheaper” ones like RDROP.

Note that all these optimizations are considered harmless - with the possible exception of tail call optimization - so the need to override the optimizer should be negligible.

You can also use the preprocessor to perform another trick: inlining - which means that small functions can be inserted into the code, instead of being called. Preventing function call overhead can really add up in a tight loop.

But always remember: the one who's most responsible to write good and fast code is always you. The optimizer is just a little helper - and not snake oil to polish up bad code.