|
From: EricT <tw...@gm...> - 2025-11-04 00:13:26
|
Subject: Your bytecode expression evaluator - impressive results with
caching!
Hey Colin:
I took a look at your bytecode-based expression evaluator and was intrigued
by the approach. I made a small modification to add caching and the results
are really impressive. Here's what I changed:
proc = args {
set exp [join $args]
if {[info exist ::cache($exp)]} {
return [uplevel [list ::tcl::unsupported::assemble $::cache($exp)]]
}
set tokens [tokenise $exp]
deb1 "TOKENS = '$tokens'"
set code [compile $tokens]
deb1 "GENERATED CODE:\n$code\n"
set ::cache($exp) $code
uplevel [list ::tcl::unsupported::assemble $code]
}
The cache is just a simple array lookup - one line to store, one line to
retrieve. But the performance impact is huge:
Performance Tests
Without caching
% time {= 1 + 2} 1000
24.937 microseconds per iteration
With caching
% time {= 1 + 2} 1000
1.8 microseconds per iteration
That's a 13x speedup! The tokenize and parse steps were eating about 92% of
the execution time.
The Real Magic: Bare Variables + Caching
What really impressed me is how well your bare variable feature synergizes
with caching:
% set a 5
5
% set b 6
6
% = a + b
11
% time {= a + b} 1000
2.079 microseconds per iteration
Now change the variable values
% set a 10
10
% = a + b
16
% time {= a + b} 1000
2.188 microseconds per iteration
The cache entry stays valid even when the variable values change! Why?
Because the bytecode stores variable names, not values:
push a; loadStk; push b; loadStk; add;
The loadStk instruction does runtime lookup, so:
- Cache key is stable: "a + b"
- Works for any values of a and b
- One cache entry handles all value combinations
Compare this to if we used $-substitution:
= $a + $b # With a=5, b=6 becomes "5 + 6"
= $a + $b # With a=10, b=6 becomes "10 + 6" - different cache key!
Every value change would create a new cache entry or worse, a cache miss.
Comparison to Other Approaches
Tcl's expr: about 0.40 microseconds
Direct C evaluator: about 0.53 microseconds
Your cached approach: about 1.80 microseconds
Your uncached approach: about 24.9 microseconds
With caching, you're only 3-4x slower than a direct C evaluator.
My Assessment
Your design is excellent. The bare variable feature isn't just syntax sugar
- it's essential for good cache performance. The synergy between:
1. Bare variables leading to stable cache keys
2. Runtime lookup keeping cache hot
3. Simple caching providing dramatic speedup
makes this really elegant.
My recommendation: Keep it in Tcl! The implementation is clean, performance
is excellent (1.8 microseconds is plenty fast), and converting to C would
add significant complexity for minimal gain (maybe getting to about 1.0
microseconds).
The Tcl prototype with caching is actually the right solution here.
Sometimes the prototype IS the product!
Excellent work on this. The bytecode approach really shines with caching
enabled.
On Sun, Nov 2, 2025 at 10:14 AM Colin Macleod via Tcl-Core <
tcl...@li...> wrote:
> Hi again,
>
> I've now made a slightly more serious prototype, see
> https://cmacleod.me.uk/tcl/expr_ng
>
> This is a modified version of the prototype I wrote for tip 676. It's
> still in Tcl, but doesn't use `expr`. It tokenises and parses the input,
> then generates TAL bytecode and uses ::tcl::unsupported::assemble to run
> that. A few examples:
>
> (bin) 100 % set a [= 3.0/4]
> 0.75
> (bin) 101 % set b [= sin(a*10)]
> 0.9379999767747389
> (bin) 102 % set c [= (b-a)*100]
> 18.79999767747389
> (bin) 103 % namespace eval nn {set d [= 10**3]}
> 1000
> (bin) 104 % set e [= a?nn::d:b]
> 1000
> (bin) 105 % = {3 + [pwd]}
> Calc: expected start of expression but found '[pwd]'
> (bin) 106 % = {3 + $q}
> Calc: expected start of expression but found '$q'
> (bin) 107 % = sin (12)
> -0.5365729180004349
>
> (bin) 108 % array set rr {one 1 two 2 three 3}
> (bin) 110 % = a * rr(two)
> Calc: expected operator but found '('
> (bin) 111 % = a * $rr(two)
> 1.5
>
> - You can use $ to get an array value substituted before the `=` code sees
> the expression.
>
> (bin) 112 % string repeat ! [= nn::d / 15]
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> Colin.
> On 02/11/2025 09:04, Donal Fellows wrote:
>
> Doing the job properly would definitely involve changing the expression
> parser, with my suggested fix being to turn all bare words not otherwise
> recognised as constants or in positions that look like function calls (it's
> a parser with some lookahead) into simple variable reads (NB: C resolves
> such ambiguities within itself differently, but that's one of the nastiest
> parts of the language). We would need to retain $ support for resolving
> ambiguity (e.g., array reads vs function calls; you can't safely inspect
> the interpreter to resolve it at the time of compiling the expression due
> to traces and unknown handlers) as well as compatibility, but that's doable
> as it is a change only in cases that are currently errors.
>
> Adding assignment is quite a bit trickier, as that needs a new major
> syntax class to describe the left side of the assignment. I suggest
> omitting that from consideration at this stage.
>
> Donal.
>
> -------- Original message --------
> From: Colin Macleod via Tcl-Core <tcl...@li...>
> <tcl...@li...>
> Date: 02/11/2025 08:13 (GMT+00:00)
> To: Pietro Cerutti <ga...@ga...> <ga...@ga...>
> Cc: tcl...@li..., av...@lo...
> Subject: Re: [TCLCORE] Fwd: TIP 672 Implementation Complete - Ready for
> Sponsorship
>
> Indeed, this toy implementation doesn't handle that:
>
> % = sin (12)
> can't read "sin": no such variable
>
> I'm not sure that's serious, but it could be fixed in a C implementation.
>
> _______________________________________________
> Tcl-Core mailing list
> Tcl...@li...
> https://lists.sourceforge.net/lists/listinfo/tcl-core
>
|