Menu

Tachyon Forth Model

Peter Jakacki

TACHYON Forth Model
(edit: click here for the more complete and up-to-date document)

Primarily designed for the Parallax multicore MCU - the P1 - and now also for the P2 which also has a version embedded called TAQOZ in "silicon" (not mask).
Versions being developed for RISC-V and RP2040 (ARM M0+)
Previous "standard" Forths were not fast enough (for starters).

No text interpreter - no text buffer - compiles word by word then executes at "full speed" (normal)
Traditional interpreted text is not how Forth code runs and it is slow and limited (compile only etc).
Solution - compile the text input as if it were a normal definition and then append an exit then automatically execute and release it using a shadow pointer.

M2M (Machine to machine) scripts can execute at full speed and respond within microseconds to the "enter". Slow text interpreters however take a while to parse through the text buffer word by word scanning the dictionary for every word and every number so that there is a big delay between each word. In this case it is not possible to "pulse" hardware at machine speeds nor elicit a timely and deterministic response.

3x Stacks (+machine stack)
DATA
RETURN
LOOP & BRANCH

Top 4 data elements in fixed registers - assembly instructions can operate directly on parameters. There are no pick or roll words to encourage simple clean and efficient stack use. (3rd and 4th etc are provided)
P2 listing of some primitives which can operate directly on the stack registers

                  ' + ( n1 n2 -- n3 ) Add top two stack items together and replace with result
0097c 0c2 f1104624 PLUS             add     a,b wc
00980 0c3 fd800090                  jmp     #\NIP
                  ' C@  ( caddr -- byte ) Fetch a byte from hub memory : 100ns @320MHz
00ac0 113 0ac04623 CFETCH _ret_    rdbyte  a,a

The Return stack is dedicated to return addresses although >R R> etc are "tolerated". IMO traditional Forth return stacks are a weak point since they also hold other parameters other than return addresses. If for instance the stack was not properly popped before returning, then Forth could pop data as a code address to return to. Why paint Forth into a corner when it is just as easy to leave the return stack for return addresses and have a loop stack.

Loop stack holds index and limit plus the branch address. Like the data stack these top elements are in fixed registers for fast operations.
There is no DO and (DO) - only DO. Same for LOOP etc.
DO pushes index, limit, and the IP onto the LOOP stack.
LOOP uses the branch address stored in a fixed register on the loop stack so there is no need to calculate the branch address at compile time.
P2 listing of LOOP

00bb4 150 f1044e01 LOOP         add     index,#1           ' increment index
00bb8 151 f2585027              cmps    limit,index wcz
00bbc 152 1603f029      if_a    mov     PTRA,loopip          ' Branch to DO
00bc0 153 1d64002d      if_a    ret
00bc4 154 f1842e01 UNLOOP       sub     lpptr,#1         ' pop loop index'

Access index and limit or leave from an external routine called from within the loop.
Example of accessing index from outside the loop word (also compile on-the-go)
TAQOZ# : .index I . SPACE ; --- ok
TAQOZ# 10 FOR .index NEXT --- 0 1 2 3 4 5 6 7 8 9 ok

3x Memory spaces
CODE - assembly and threaded
NAME - Dictionary
DATA - Variables and buffers

Code can fall-through to the next code word as there are no headers mixed in with the code.
MEGA 1000 *
KILO 1000 * ;

Examine the wordcode produced by a fall-through.

TAQOZ# SEE MEGA
1B5B4: pub MEGA
0BFC4: 23E8     1000
0BFC6: 1164     *
1B5AD: pub KILO
0BFC8: 23E8     1000
0BFCA: 1164     *
0BFCC: 0065     ;
     ( 10  bytes )

The dictionary doesn't need any link fields since one header follows another. The dictionary builds down towards code building up.
Each header is comprised of count+flags(1),Name(1..31),address(2) then the previous entry.
TAQOZ# @WORDS $10 DUMP ---
1B5AD: 04 4B 49 4C 4F C8 BF 04 4D 45 47 41 C4 BF 01 4B '.KILO...MEGA...K'

If the count+flag field is zero then this indicates a special control field along with the next byte.
The dictionary can be moved and entries deleted to reclaim memory from "headerless" code. As well as the colon definition there are pub pri and pre where pri sets an attribute in the header which reclaim words can use to determine which headers to remove.
* No smudge attribute - only a simple ignore latest name when within a definition.

Variables are specified as they would be in any language as to size etc using common expressions such as byte/word/long/double and the plural of these etc. Groups of data variables can be treated as an array since data is contiguous.
Example of specifying "variables" in dataspace

       *** FAT32 BOOT RECORD ***

3   bytes   fat32           --- jump code +nop
8   bytes   oemname         --- MSWIN4.1
    word    b/s             --- 0200 = 512B (bytes/sector)
    byte    s/c             --- 40 = 32kB clusters (sectors/cluster)
    word    rsvd            --- 32 reserved sectors from boot record until first fat table
    byte    fats            --- 02
2   res                     --- Maximum Root Directory Entries (non-FAT32)

16-bit hybrid threaded wordcode
Lower address range are direct calls to assembly code primitives
Block of 2kB addresses reserved for encoded functions:
Short literals (10-bit including down to -8)
Conditional relative branches ( IF ELSE WHILE UNTIL REPEAT)
Memory reused for data space
implicit threaded code above this block (no CFA field - just wordcodes0
Easy to patch code.

Threaded addresses are always on an 16-bit boundary so the lsb is not used for addressing. Instead, the lsb indicates a jump rather than a call so this can also saves an EXIT. This is handled automatically during compilation.
Example of wordcodes and using the jump lsb on threaded which can also replace the EXIT by jumping

TAQOZ# : .CIRC  ( radius -- ) DUP IF 2* 355,000,000 113 */ THEN PRINT ; ---  ok
TAQOZ# 25 .CIRC --- 157079646 ok
TAQOZ# SEE .CIRC
1B599: pub .CIRC
0BFE6: 009C     DUP
0BFE8: 2406     IF $BFF6
0BFEA: 00F0       2*
0BFEC: 007C       := 355000000    $1528_DEC0
0BFF2: 2071       113
0BFF4: 119C       */
              THEN
0BFF6: 357D     PRINT ;
     ( 18  bytes )

Page codes are used for any reference to code outside of the first 64k bank but otherwise primitives etc are still 16-bit. So far I haven't really needed any page code stuff as wordcode is so very compact and code has its own codespace that has only ever exceeded 64k by manually changing the code-pointer for testing purposes.

Why not bytecode? Isn't that more compact?
My first versions of Tachyon were written as a bytecode model which looks really compact initially. Vector table overhead was needed for threaded calls so 2 bytecodes were always required anyway as they also are even for an 8-bit literal. Once the code grows it gets even more messy and slower whereas wordcode is far more efficient "overall" and actually faster and more compact and clean.

Basic user features:
Assignable CLI control key shortcuts - not just CR and BS
Number prefix modes #10 $DEAD_BEEF %1101 &192.168.0.1 ^A 'A'
Mix any symbols in with numbers (1,000 12:45:59 etc)
Numbers are preprocessed before dictionary is searched.
DUMP can handle various memories, files, and devices.
TRACE execution
CAP wordcode usage counters
LAP time execution time in cycles/us etc
Single word comments end with an underscore_. Ready?_
--- comment separator between user input and response.

Assignable control key vector list

TAQOZ# .CTRLS ---
$02 ^B ~MBR
$03 ^C 03B42
$04 ^D ~DEBUG
$06 ^F ~FLASH
$07 ^G 02A54
$08 ^H 03B24
$09 ^I 03B18
$0B ^K 03B40
$0C ^L CLS
$0D ^M 03B3C
$0E ^N COLD
$10 ^P 03B46
$11 ^Q 03B48
$12 ^R ~RXCAP
$13 ^S 03B08
$14 ^T 03B52
$15 ^U ~USAGE
$16 ^V .VER
$17 ^W ~QWORDS
$18 ^X 03B3C
$19 ^Y ~WORDS
$1A ^Z REBOOT
$1B ^[ 03B14
$1C ^\ CRLF
$1D ^] ~SAFE
$1F ^_ DEBUG ok

LAP timing

TAQOZ# LAP 1,000,000 FOR NEXT LAP .LAP --- 32,000,139 cycles= 100,000,434ns @320MHz ok

Trace function

TAQOZ# TRACE 1 8 FOR 2* NEXT UNTRACE ---
0BD8C : 2001  $001
0BD8E : 2008  $008              1(00000001 )
0BD90 : 1124  FOR               2(00000008 00000001 )
0BD92 : 00F0  2*                1(00000001 )
0BD94 : 015D  NEXT              1(00000002 )
0BD92 : 00F0  2*                1(00000002 )
0BD94 : 015D  NEXT              1(00000004 )
0BD92 : 00F0  2*                1(00000004 )
0BD94 : 015D  NEXT              1(00000008 )
0BD92 : 00F0  2*                1(00000008 )
0BD94 : 015D  NEXT              1(00000010 )
0BD92 : 00F0  2*                1(00000010 )
0BD94 : 015D  NEXT              1(00000020 )
0BD92 : 00F0  2*                1(00000020 )
0BD94 : 015D  NEXT              1(00000040 )
0BD92 : 00F0  2*                1(00000040 )
0BD94 : 015D  NEXT              1(00000080 )
0BD92 : 00F0  2*                1(00000080 )
0BD94 : 015D  NEXT              1(00000100 )
0BD96 : 0430  UNTRACE           1(00000100 ) ok

Redirectable DUMP

TAQOZ# 0 $20 DUMP ---
00000: D4 15 80 FD  50 32 20 20  20 20 20 20  03 64 00 00     '....P2      .d..'
00010: 00 2D 31 01  00 D0 12 13  FB 3F 4D 01  00 10 0E 00     '.-1......?M.....' ok
TAQOZ# FOPEN TAQOZ.WAV Opened @ $00C0_3B96 ---  ok
TAQOZ# 0 $20 SD DUMP ---
00000: 52 49 46 46  24 24 79 00  57 41 56 45  66 6D 74 20     'RIFF$$y.WAVEfmt '
00010: 10 00 00 00  01 00 01 00  44 AC 00 00  88 58 01 00     '........D....X..' ok

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.