Menu

Code is contiguous

When we write assembly code it is quite common to call a sub-routine and then sometimes since the next instruction is a return, we can optimize the code by hand and replace the "call xxx then return" with a simple "jump xxx". When that sub-routine immediatley follows the jump, then there isn't even any need to jump, the code simply falls through to the sub-routine it normally would have callled before returning. The same applies to Tachyon so let's look at some coding examples to show what is possible and quite "legal", at least in Tachyon county.

Instead of :

pub DUMP ( addr cnt -- ) --- dump memory 
    < code for dump >
    ;
pub QD ( addr -- ) --- quick dump of 32 bytes from addr
       32 DUMP
       ;

We can instead write it this way:

pub QD ( addr -- ) --- quick dump of 32 bytes from addr
    32
pub DUMP ( addr cnt -- ) --- dump memory 

For any seasoned Forth coder they know that you can't create a new definition while you are still in a definition, and also code definitions are normally separated by dictionary headers. But Tachyon separates the dictionary and code (and data) sections so all code is contiguous. The other thing is that there is no SMUDGE or smudge bit in the header, once it is created, it doesn't need to be unsmudged to be visible or active. The only mechanism to prevent inadvertent recursion is the dictionary searches are started after the latest name if it is still in a definition.

This is how it is used to good effect:

pub MHZ ( MHz -- )      1000 *
pub KHZ ( kHz -- )      1000 *
--- setup smartpin to output a square wave frequency
pub HZ ( Hz -- )        NCOCNT
--- select NCO frequency mode
pub NCO ( count -- )        %1_00110
pub SETNCO ( count mode --) WRFNC 1 WXPIN WYPIN ;

Here SETNCO is the base word and NCO was built on top of that, as was HZ, then KHZ, and MHZ, 22 code bytes in all.

EXIT OPTIMIZATION - JUMP instead of CALL + EXIT
When ; completes a Forth word it normally checks to see if there are any incomplete structures and then compiles an EXIT. Tachyon does it a little different in that before it compiles an EXIT, it checks to see if the previous wordcode was a call to another Forth word. If it was, it simply sets bit 0 of that wordcode rather than compiling an EXIT. Since wordcode that calls other threaded wordcode is always on a 16-bit boundary, then they are always even addresses and bit 0 would otherwise be zero or unused. The threaded address is checked at runtime and if bit 0 is set it strips it off and signals that the return address does not need to be saved, effectively turning the implicit call into a jump, and saving time and code memory.
Takes this simple bit of code that is handy for dumping a small section of memory as 32-bit longs:

: QL ( addr -- ) 32 DUMPL ; --- quick "long dump"

When we decompile that we can see how it was coded:

TAQOZ# SEE QL
1D4A1: pub QL
0885E: 1820     32    $0020  
08860: 2847     DUMPL ; 
      4  bytes 

There are only two wordcode instructions, the compact encoded literal $20, and an odd address $2847 that is supposed to be the address of DUMPL which is:

TAQOZ# ' DUMPL .L --- $0000_2846 ok

So instead of compiling $2846 plus an EXIT, Tachyon optimized this into a jump to DUMPL instead. Very compact, and faster as well.

BTW, without the use of an encoded wordcode range for literals etc and without exit optimization, the QL word would have required twice the code memory.


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.