TACHYON Forth Model
(edit: click here for the more complete and up-to-date document)
Primarily designed for the Parallax multicore MCU - the P1 - and now also for the P2 which also has a version embedded called TAQOZ in "silicon" (not mask).
Versions being developed for RISC-V and RP2040 (ARM M0+)
Previous "standard" Forths were not fast enough (for starters).
No text interpreter - no text buffer - compiles word by word then executes at "full speed" (normal)
Traditional interpreted text is not how Forth code runs and it is slow and limited (compile only etc).
Solution - compile the text input as if it were a normal definition and then append an exit then automatically execute and release it using a shadow pointer.
M2M (Machine to machine) scripts can execute at full speed and respond within microseconds to the "enter". Slow text interpreters however take a while to parse through the text buffer word by word scanning the dictionary for every word and every number so that there is a big delay between each word. In this case it is not possible to "pulse" hardware at machine speeds nor elicit a timely and deterministic response.
3x Stacks (+machine stack)
DATA
RETURN
LOOP & BRANCH
Top 4 data elements in fixed registers - assembly instructions can operate directly on parameters. There are no pick or roll words to encourage simple clean and efficient stack use. (3rd and 4th etc are provided)
P2 listing of some primitives which can operate directly on the stack registers
' + ( n1 n2 -- n3 ) Add top two stack items together and replace with result 0097c 0c2 f1104624 PLUS add a,b wc 00980 0c3 fd800090 jmp #\NIP ' C@ ( caddr -- byte ) Fetch a byte from hub memory : 100ns @320MHz 00ac0 113 0ac04623 CFETCH _ret_ rdbyte a,a
The Return stack is dedicated to return addresses although >R R> etc are "tolerated". IMO traditional Forth return stacks are a weak point since they also hold other parameters other than return addresses. If for instance the stack was not properly popped before returning, then Forth could pop data as a code address to return to. Why paint Forth into a corner when it is just as easy to leave the return stack for return addresses and have a loop stack.
Loop stack holds index and limit plus the branch address. Like the data stack these top elements are in fixed registers for fast operations.
There is no DO and (DO) - only DO. Same for LOOP etc.
DO pushes index, limit, and the IP onto the LOOP stack.
LOOP uses the branch address stored in a fixed register on the loop stack so there is no need to calculate the branch address at compile time.
P2 listing of LOOP
00bb4 150 f1044e01 LOOP add index,#1 ' increment index 00bb8 151 f2585027 cmps limit,index wcz 00bbc 152 1603f029 if_a mov PTRA,loopip ' Branch to DO 00bc0 153 1d64002d if_a ret 00bc4 154 f1842e01 UNLOOP sub lpptr,#1 ' pop loop index'
Access index and limit or leave from an external routine called from within the loop.
Example of accessing index from outside the loop word (also compile on-the-go)
TAQOZ# : .index I . SPACE ; --- ok
TAQOZ# 10 FOR .index NEXT --- 0 1 2 3 4 5 6 7 8 9 ok
3x Memory spaces
CODE - assembly and threaded
NAME - Dictionary
DATA - Variables and buffers
Examine the wordcode produced by a fall-through.
TAQOZ# SEE MEGA 1B5B4: pub MEGA 0BFC4: 23E8 1000 0BFC6: 1164 * 1B5AD: pub KILO 0BFC8: 23E8 1000 0BFCA: 1164 * 0BFCC: 0065 ; ( 10 bytes )
The dictionary doesn't need any link fields since one header follows another. The dictionary builds down towards code building up.
Each header is comprised of count+flags(1),Name(1..31),address(2) then the previous entry.
TAQOZ# @WORDS $10 DUMP ---
1B5AD: 04 4B 49 4C 4F C8 BF 04 4D 45 47 41 C4 BF 01 4B '.KILO...MEGA...K'
If the count+flag field is zero then this indicates a special control field along with the next byte.
The dictionary can be moved and entries deleted to reclaim memory from "headerless" code. As well as the colon definition there are pub pri and pre where pri sets an attribute in the header which reclaim words can use to determine which headers to remove.
* No smudge attribute - only a simple ignore latest name when within a definition.
Variables are specified as they would be in any language as to size etc using common expressions such as byte/word/long/double and the plural of these etc. Groups of data variables can be treated as an array since data is contiguous.
Example of specifying "variables" in dataspace
*** FAT32 BOOT RECORD *** 3 bytes fat32 --- jump code +nop 8 bytes oemname --- MSWIN4.1 word b/s --- 0200 = 512B (bytes/sector) byte s/c --- 40 = 32kB clusters (sectors/cluster) word rsvd --- 32 reserved sectors from boot record until first fat table byte fats --- 02 2 res --- Maximum Root Directory Entries (non-FAT32)
16-bit hybrid threaded wordcode
Lower address range are direct calls to assembly code primitives
Block of 2kB addresses reserved for encoded functions:
Short literals (10-bit including down to -8)
Conditional relative branches ( IF ELSE WHILE UNTIL REPEAT)
Memory reused for data space
implicit threaded code above this block (no CFA field - just wordcodes0
Easy to patch code.
Threaded addresses are always on an 16-bit boundary so the lsb is not used for addressing. Instead, the lsb indicates a jump rather than a call so this can also saves an EXIT. This is handled automatically during compilation.
Example of wordcodes and using the jump lsb on threaded which can also replace the EXIT by jumping
TAQOZ# : .CIRC ( radius -- ) DUP IF 2* 355,000,000 113 */ THEN PRINT ; --- ok TAQOZ# 25 .CIRC --- 157079646 ok TAQOZ# SEE .CIRC 1B599: pub .CIRC 0BFE6: 009C DUP 0BFE8: 2406 IF $BFF6 0BFEA: 00F0 2* 0BFEC: 007C := 355000000 $1528_DEC0 0BFF2: 2071 113 0BFF4: 119C */ THEN 0BFF6: 357D PRINT ; ( 18 bytes )
Page codes are used for any reference to code outside of the first 64k bank but otherwise primitives etc are still 16-bit. So far I haven't really needed any page code stuff as wordcode is so very compact and code has its own codespace that has only ever exceeded 64k by manually changing the code-pointer for testing purposes.
Why not bytecode? Isn't that more compact?
My first versions of Tachyon were written as a bytecode model which looks really compact initially. Vector table overhead was needed for threaded calls so 2 bytecodes were always required anyway as they also are even for an 8-bit literal. Once the code grows it gets even more messy and slower whereas wordcode is far more efficient "overall" and actually faster and more compact and clean.
Basic user features:
Assignable CLI control key shortcuts - not just CR and BS
Number prefix modes #10 $DEAD_BEEF %1101 &192.168.0.1 ^A 'A'
Mix any symbols in with numbers (1,000 12:45:59 etc)
Numbers are preprocessed before dictionary is searched.
DUMP can handle various memories, files, and devices.
TRACE execution
CAP wordcode usage counters
LAP time execution time in cycles/us etc
Single word comments end with an underscore_. Ready?_
--- comment separator between user input and response.
Assignable control key vector list
TAQOZ# .CTRLS --- $02 ^B ~MBR $03 ^C 03B42 $04 ^D ~DEBUG $06 ^F ~FLASH $07 ^G 02A54 $08 ^H 03B24 $09 ^I 03B18 $0B ^K 03B40 $0C ^L CLS $0D ^M 03B3C $0E ^N COLD $10 ^P 03B46 $11 ^Q 03B48 $12 ^R ~RXCAP $13 ^S 03B08 $14 ^T 03B52 $15 ^U ~USAGE $16 ^V .VER $17 ^W ~QWORDS $18 ^X 03B3C $19 ^Y ~WORDS $1A ^Z REBOOT $1B ^[ 03B14 $1C ^\ CRLF $1D ^] ~SAFE $1F ^_ DEBUG ok
LAP timing
TAQOZ# LAP 1,000,000 FOR NEXT LAP .LAP --- 32,000,139 cycles= 100,000,434ns @320MHz ok
Trace function
TAQOZ# TRACE 1 8 FOR 2* NEXT UNTRACE --- 0BD8C : 2001 $001 0BD8E : 2008 $008 1(00000001 ) 0BD90 : 1124 FOR 2(00000008 00000001 ) 0BD92 : 00F0 2* 1(00000001 ) 0BD94 : 015D NEXT 1(00000002 ) 0BD92 : 00F0 2* 1(00000002 ) 0BD94 : 015D NEXT 1(00000004 ) 0BD92 : 00F0 2* 1(00000004 ) 0BD94 : 015D NEXT 1(00000008 ) 0BD92 : 00F0 2* 1(00000008 ) 0BD94 : 015D NEXT 1(00000010 ) 0BD92 : 00F0 2* 1(00000010 ) 0BD94 : 015D NEXT 1(00000020 ) 0BD92 : 00F0 2* 1(00000020 ) 0BD94 : 015D NEXT 1(00000040 ) 0BD92 : 00F0 2* 1(00000040 ) 0BD94 : 015D NEXT 1(00000080 ) 0BD92 : 00F0 2* 1(00000080 ) 0BD94 : 015D NEXT 1(00000100 ) 0BD96 : 0430 UNTRACE 1(00000100 ) ok
Redirectable DUMP
TAQOZ# 0 $20 DUMP --- 00000: D4 15 80 FD 50 32 20 20 20 20 20 20 03 64 00 00 '....P2 .d..' 00010: 00 2D 31 01 00 D0 12 13 FB 3F 4D 01 00 10 0E 00 '.-1......?M.....' ok TAQOZ# FOPEN TAQOZ.WAV Opened @ $00C0_3B96 --- ok TAQOZ# 0 $20 SD DUMP --- 00000: 52 49 46 46 24 24 79 00 57 41 56 45 66 6D 74 20 'RIFF$$y.WAVEfmt ' 00010: 10 00 00 00 01 00 01 00 44 AC 00 00 88 58 01 00 '........D....X..' ok