|
From: EricT <tw...@gm...> - 2025-11-06 18:19:22
|
Colin,
Thanks for the reply. I understand your consistency concerns -
predictability is important for language design.
I think it's important to recognize that this is a new "little language"
with its own grammar and semantics, separate from Tcl's expression syntax.
Because you're designing it fresh, you have the freedom to make deliberate
choices. For instance, reserving true and false as keywords makes perfect
sense in this language. Similarly, you can define how the language handles
the name(arg) pattern.
Regarding Perl - interestingly, both Perl and Tcl inherited the $ notation
from shell scripting. Dropping it in this new language is actually moving
away from Perl-like syntax, not towards it. But it's much more important
than just convenience that dropping it serves, it is the key to caching.
For example,
array set data {1 100 2 200}
set idx 1
= $data($idx) * 2 ;# Becomes: = 100 * 2, cache key: "100 * 2"
set idx 2
= $data($idx) * 2 ;# Becomes: = 200 * 2, cache key: "200 * 2" - new
entry!
Every value change creates a new cache entry and defeats the ability to
compile expressions, similar to unbraced expr. My factorial test case
(iterative loop to 50) showed 49 cache hits on one entry with bare
variables; with $-syntax, those would become cache misses.
When I approached the array vs. function choice, I was actually quite
surprised by your choice of testing for functions from the tcl::mathfunc::
namespace, which is much better than my first thought of using uplevel and
checking for array exists, though that is still an option. Either can work,
but by checking for mathfunc, you're allowing for custom functions. In
fact, here's how I added a fibonacci function to = with almost no effort:
package require math
proc tcl::mathfunc::fibonacci {n} {
return [math::fibonacci $n]
}
= fibonacci(x) + fibonacci(y)
The namespace lookup is elegant and extensible. The ambiguity could be
documented (functions checked first, then arrays) or could be checked and
an error thrown, it's all up to the new language's designer.
Regards,
Eric
Here's where you can find it with the mods and tests:
https://github.com/rocketship88/colin-parser/tree/main
On Thu, Nov 6, 2025 at 1:37 AM Colin Macleod via Tcl-Core <
tcl...@li...> wrote:
> Hi Eric, thanks for your support.
>
> For the boolean check I think it's more consistent to disallow alphabetic
> forms entirely and only accept numeric zero or non-zero, which is what
> boolean operations and functions return anyway. An alphabetic string should
> always be treated as a variable reference, no exceptions to worry about.
>
> Similarly, i don't like the idea of foo(bar) sometimes being treated as
> a function call and sometimes as an array reference, depending on what
> definitions have been made elsewhere. One should be able to tell what kind
> of thing it is just from the expression code, without searching for other
> definitions. It is still possible to include an array reference by writing
> $foo(bar) so I would treat foo(bar) as a function call always, and fail
> if the function has not been defined. Sacrificing consistency for minor
> convenience is the slippery slope that leads to Perl. :-)
>
> You are welcome to post your modified version anywhere you like.
>
> Personally I still want to try a C implementation, but that will take me a
> few weeks.
>
> Best regards,
> Colin.
> On 05/11/2025 22:22, EricT wrote:
>
> Hi Colin,
>
> I've successfully modified your amazing code to handle arrays. In doing so, I also found 2 other issues, one is with your Boolean check, the other with your function name check, both because of [string is] issues.
>
> - Boolean check: `$token eq "false" || $token eq "true"` (was `[string is boolean $token]` - treated 'f','n', 't', 'y', etc. variables as boolean false, no, true, yes, ...)
>
> - Function check: `[regexp {^[[:alpha:]]} $token]` (was `[string is alpha $token]` - broke log10, atan2)
>
>
> here's the code for arrays:
>
> # Function call or array reference?
> set nexttok [lindex $::tokens $::tokpos]
> if {$nexttok eq "(" && [regexp {^[[:alpha:]]} $token]} {
> set fun [namespace which tcl::mathfunc::$token]
> if {$fun ne {}} {
> # It's a function
> incr ::tokpos
> set opcodes "push $fun; "
> append opcodes [parseFuncArgs]
> return $opcodes
> } else {
> # Not a function, assume array reference
> incr ::tokpos
> set opcodes "push $token; "
> # Parse the index expression - leaves VALUE on stack
> append opcodes [parse 0]
> # Expect closing paren
> set closing [lindex $::tokens $::tokpos]
> if {$closing ne ")"} {
> error "Calc: expected ')' but found '$closing'"
> }
> # Stack now has: [arrayname, indexvalue]
> incr ::tokpos
> append opcodes "loadArrayStk; "
> return $opcodes
> }
> }
>
>
> In addition, there has indeed been some changes in the bytecode, land and lor are no longer supported in 9.0 although they work in 8.6.
>
> I had an AI generate some 117 test cases, which all pass on 8.6 and 111 on 9.x (the land/lor not being tested in 9.x).
>
> Colin, with your permission, I can post the code as a new file, with all the test cases, say on a repository at github.
>
> I think a new TIP is worth considering; one that promotes assemble to a supported form, with a compile and handle approach to avoid the time parsing the ascii byte code text. I think that this would be great for your = command, but also quite useful for others who might want to create their own little languages.
>
> By doing it this way, it remains pure tcl, and avoids all the problems with different systems and hardware that a binary extension would create. In the end, I believe your code can achieve performance parity with expr. Not only does it remove half the [expr {...}] baggage, but all the $'s too! So much easier on these old eyes.
>
> Regards,
>
> Eric
>
>
>
> On Tue, Nov 4, 2025 at 1:06 PM EricT <tw...@gm...> wrote:
>
>> Hi Colin,
>>
>> Hmmm, why can't you do bareword on $a(b) as a(b) you just need to do an
>> uplevel to see if a is a variable, if not, it would have to be a function.
>> True?
>>
>> % tcl::unsupported::disassemble script {set a [expr {$b($c)}] }
>> snip
>> Command 2: "expr {$b($c)}..."
>> (2) push1 1 # "b"
>> (4) push1 2 # "c"
>> (6) loadStk
>> (7) loadArrayStk
>> (8) tryCvtToNumeric
>> (9) storeStk
>> (10) done
>>
>> This doesn't look too much different from what you are producing.
>>
>> I think what's really needed here is a TIP that would open up the
>> bytecode a bit so you don't need to use an unsupported command. And then
>> maybe even have a new command to take the string byte code you are now
>> producing and return a handle to a cached version that was probably
>> equivalent to the existing bytecode. Then your cache array would be
>>
>> set cache($exp) $handle
>>
>> Instead of it having to parse the text, it could be as fast as bytecode.
>> You'd likely be just as fast as expr, and safe as well, since you can't
>> pass a string command in where the bareword is required:
>>
>> % set x {[pwd]}
>> [pwd]
>> % = sqrt(x)
>> exp= |sqrt(x)| code= |push ::tcl::mathfunc::sqrt; push x; loadStk;
>> invokeStk 2; | ifexist: 0
>> expected floating-point number but got "[pwd]"
>>
>> I think you really have something here, perhaps this is the best answer
>> yet to slay the expr dragon!
>>
>> Regards,
>>
>> Eric
>>
>>
>> On Tue, Nov 4, 2025 at 6:52 AM Colin Macleod via Tcl-Core <
>> tcl...@li...> wrote:
>>
>>> Hi Eric,
>>>
>>> That's very neat!
>>>
>>> Yes, a pure Tcl version could go into TclLib. I still think it may be
>>> worth trying a C implementation though. The work-around that's needed for
>>> array references [= 2* $a(b)] would defeat the caching, so it would be good
>>> to speed up the parsing if possible. Also I think your caching may be
>>> equivalent to doing byte-compilation, in which case it may make sense to
>>> use the framework which already exists for that.
>>>
>>> Colin.
>>> On 04/11/2025 01:18, EricT wrote:
>>>
>>> that is:
>>>
>>> if {[info exist ::cache($exp)]} {
>>> tailcall ::tcl::unsupported::assemble $::cache($exp)
>>> }
>>>
>>> (hate gmail!)
>>>
>>>
>>> On Mon, Nov 3, 2025 at 5:17 PM EricT <tw...@gm...> wrote:
>>>
>>>> and silly of me, it should be:
>>>> if {[info exist ::cache($exp)]} {
>>>> tailcall ::tcl::unsupported::assemble $::cache($exp)
>>>> }
>>>>
>>>>
>>>> On Mon, Nov 3, 2025 at 4:50 PM EricT <tw...@gm...> wrote:
>>>>
>>>>> With a debug line back in plus the tailcall:
>>>>>
>>>>> proc = args {
>>>>> set exp [join $args]
>>>>> if { [info exist ::cache($exp)] } {
>>>>> return [tailcall ::tcl::unsupported::assemble $::cache($exp)]
>>>>> }
>>>>> set tokens [tokenise $exp]
>>>>> deb1 "TOKENS = '$tokens'"
>>>>> set code [compile $tokens]
>>>>> deb1 "GENERATED CODE:\n$code\n"
>>>>> puts "exp= |$exp| code= |$code| ifexist: [info exist
>>>>> ::cache($exp)]"
>>>>> set ::cache($exp) $code
>>>>> uplevel [list ::tcl::unsupported::assemble $code]
>>>>> }
>>>>>
>>>>> % set a 5
>>>>> 5
>>>>> % set b 10
>>>>> 10
>>>>> % = a + b
>>>>> exp= |a + b| code= |push a; loadStk; push b; loadStk; add; | ifexist: 0
>>>>> 15
>>>>> % = a + b
>>>>> 15
>>>>>
>>>>> % time {= a + b} 1000
>>>>> 1.73 microseconds per iteration
>>>>>
>>>>>
>>>>> Faster still!
>>>>>
>>>>> I thought the uplevel was needed to be able to get the local
>>>>> variables, seems not.
>>>>>
>>>>> % proc foo arg {set a 5; set b 10; set c [= a+b+arg]}
>>>>> % foo 5
>>>>> exp= |a+b+arg| code= |push a; loadStk; push b; loadStk; add; push arg;
>>>>> loadStk; add; | ifexist: 0
>>>>> 20
>>>>> % foo 5
>>>>> 20
>>>>>
>>>>> % proc foo arg {global xxx; set a 5; set b 10; set c [= a+b+arg+xxx]}
>>>>>
>>>>> % set xxx 100
>>>>> 100
>>>>> % foo 200
>>>>> 315
>>>>> % time {foo 200} 10000
>>>>> 2.1775 microseconds per iteration
>>>>>
>>>>> % parray cache
>>>>> cache(a + b) = push a; loadStk; push b; loadStk; add;
>>>>> cache(a+b+arg) = push a; loadStk; push b; loadStk; add; push arg;
>>>>> loadStk; add;
>>>>> cache(a+b+arg+xxx) = push a; loadStk; push b; loadStk; add; push arg;
>>>>> loadStk; add; push xxx; loadStk; add;
>>>>>
>>>>>
>>>>> Very Impressive, great job Colin! Great catch Don!
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 3, 2025 at 4:22 PM Donald Porter via Tcl-Core <
>>>>> tcl...@li...> wrote:
>>>>>
>>>>>> Check what effect replacing [uplevel] with [tailcall] has.
>>>>>>
>>>>>> On Nov 3, 2025, at 7:13 PM, EricT <tw...@gm...> wrote:
>>>>>>
>>>>>> Subject: Your bytecode expression evaluator - impressive results with
>>>>>> caching!
>>>>>>
>>>>>> Hey Colin:
>>>>>>
>>>>>> I took a look at your bytecode-based expression evaluator and was
>>>>>> intrigued by the approach. I made a small modification to add caching and
>>>>>> the results are really impressive. Here's what I changed:
>>>>>>
>>>>>> proc = args {
>>>>>> set exp [join $args]
>>>>>> if {[info exist ::cache($exp)]} {
>>>>>> return [uplevel [list ::tcl::unsupported::assemble
>>>>>> $::cache($exp)]]
>>>>>> }
>>>>>> set tokens [tokenise $exp]
>>>>>> deb1 "TOKENS = '$tokens'"
>>>>>> set code [compile $tokens]
>>>>>> deb1 "GENERATED CODE:\n$code\n"
>>>>>> set ::cache($exp) $code
>>>>>> uplevel [list ::tcl::unsupported::assemble $code]
>>>>>> }
>>>>>>
>>>>>> The cache is just a simple array lookup - one line to store, one line
>>>>>> to retrieve. But the performance impact is huge:
>>>>>>
>>>>>> Performance Tests
>>>>>>
>>>>>> Without caching
>>>>>> % time {= 1 + 2} 1000
>>>>>> 24.937 microseconds per iteration
>>>>>>
>>>>>> With caching
>>>>>> % time {= 1 + 2} 1000
>>>>>> 1.8 microseconds per iteration
>>>>>>
>>>>>> That's a 13x speedup! The tokenize and parse steps were eating about
>>>>>> 92% of the execution time.
>>>>>>
>>>>>> The Real Magic: Bare Variables + Caching
>>>>>>
>>>>>> What really impressed me is how well your bare variable feature
>>>>>> synergizes with caching:
>>>>>>
>>>>>> % set a 5
>>>>>> 5
>>>>>> % set b 6
>>>>>> 6
>>>>>> % = a + b
>>>>>> 11
>>>>>> % time {= a + b} 1000
>>>>>> 2.079 microseconds per iteration
>>>>>>
>>>>>> Now change the variable values
>>>>>> % set a 10
>>>>>> 10
>>>>>> % = a + b
>>>>>> 16
>>>>>> % time {= a + b} 1000
>>>>>> 2.188 microseconds per iteration
>>>>>>
>>>>>> The cache entry stays valid even when the variable values change!
>>>>>> Why? Because the bytecode stores variable names, not values:
>>>>>>
>>>>>> push a; loadStk; push b; loadStk; add;
>>>>>>
>>>>>> The loadStk instruction does runtime lookup, so:
>>>>>> - Cache key is stable: "a + b"
>>>>>> - Works for any values of a and b
>>>>>> - One cache entry handles all value combinations
>>>>>>
>>>>>> Compare this to if we used $-substitution:
>>>>>>
>>>>>> = $a + $b # With a=5, b=6 becomes "5 + 6"
>>>>>> = $a + $b # With a=10, b=6 becomes "10 + 6" - different cache key!
>>>>>>
>>>>>> Every value change would create a new cache entry or worse, a cache
>>>>>> miss.
>>>>>>
>>>>>> Comparison to Other Approaches
>>>>>>
>>>>>> Tcl's expr: about 0.40 microseconds
>>>>>> Direct C evaluator: about 0.53 microseconds
>>>>>> Your cached approach: about 1.80 microseconds
>>>>>> Your uncached approach: about 24.9 microseconds
>>>>>>
>>>>>> With caching, you're only 3-4x slower than a direct C evaluator.
>>>>>>
>>>>>>
>>>>>> My Assessment
>>>>>>
>>>>>> Your design is excellent. The bare variable feature isn't just syntax
>>>>>> sugar - it's essential for good cache performance. The synergy between:
>>>>>>
>>>>>> 1. Bare variables leading to stable cache keys
>>>>>> 2. Runtime lookup keeping cache hot
>>>>>> 3. Simple caching providing dramatic speedup
>>>>>>
>>>>>> makes this really elegant.
>>>>>>
>>>>>> My recommendation: Keep it in Tcl! The implementation is clean,
>>>>>> performance is excellent (1.8 microseconds is plenty fast), and converting
>>>>>> to C would add significant complexity for minimal gain (maybe getting to
>>>>>> about 1.0 microseconds).
>>>>>>
>>>>>> The Tcl prototype with caching is actually the right solution here.
>>>>>> Sometimes the prototype IS the product!
>>>>>>
>>>>>> Excellent work on this. The bytecode approach really shines with
>>>>>> caching enabled.
>>>>>>
>>>>>> On Sun, Nov 2, 2025 at 10:14 AM Colin Macleod via Tcl-Core <
>>>>>> tcl...@li...> wrote:
>>>>>>
>>>>>>> Hi again,
>>>>>>>
>>>>>>> I've now made a slightly more serious prototype, see
>>>>>>> https://cmacleod.me.uk/tcl/expr_ng
>>>>>>>
>>>>>>> This is a modified version of the prototype I wrote for tip 676.
>>>>>>> It's still in Tcl, but doesn't use `expr`. It tokenises and parses the
>>>>>>> input, then generates TAL bytecode and uses ::tcl::unsupported::assemble to
>>>>>>> run that. A few examples:
>>>>>>>
>>>>>>> (bin) 100 % set a [= 3.0/4]
>>>>>>> 0.75
>>>>>>> (bin) 101 % set b [= sin(a*10)]
>>>>>>> 0.9379999767747389
>>>>>>> (bin) 102 % set c [= (b-a)*100]
>>>>>>> 18.79999767747389
>>>>>>> (bin) 103 % namespace eval nn {set d [= 10**3]}
>>>>>>> 1000
>>>>>>> (bin) 104 % set e [= a?nn::d:b]
>>>>>>> 1000
>>>>>>> (bin) 105 % = {3 + [pwd]}
>>>>>>> Calc: expected start of expression but found '[pwd]'
>>>>>>> (bin) 106 % = {3 + $q}
>>>>>>> Calc: expected start of expression but found '$q'
>>>>>>> (bin) 107 % = sin (12)
>>>>>>> -0.5365729180004349
>>>>>>>
>>>>>>> (bin) 108 % array set rr {one 1 two 2 three 3}
>>>>>>> (bin) 110 % = a * rr(two)
>>>>>>> Calc: expected operator but found '('
>>>>>>> (bin) 111 % = a * $rr(two)
>>>>>>> 1.5
>>>>>>>
>>>>>>> - You can use $ to get an array value substituted before the `=`
>>>>>>> code sees the expression.
>>>>>>>
>>>>>>> (bin) 112 % string repeat ! [= nn::d / 15]
>>>>>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>>>>>>
>>>>>>> Colin.
>>>>>>> On 02/11/2025 09:04, Donal Fellows wrote:
>>>>>>>
>>>>>>> Doing the job properly would definitely involve changing the
>>>>>>> expression parser, with my suggested fix being to turn all bare words not
>>>>>>> otherwise recognised as constants or in positions that look like function
>>>>>>> calls (it's a parser with some lookahead) into simple variable reads (NB: C
>>>>>>> resolves such ambiguities within itself differently, but that's one of the
>>>>>>> nastiest parts of the language). We would need to retain $ support for
>>>>>>> resolving ambiguity (e.g., array reads vs function calls; you can't safely
>>>>>>> inspect the interpreter to resolve it at the time of compiling the
>>>>>>> expression due to traces and unknown handlers) as well as compatibility,
>>>>>>> but that's doable as it is a change only in cases that are currently errors.
>>>>>>>
>>>>>>> Adding assignment is quite a bit trickier, as that needs a new major
>>>>>>> syntax class to describe the left side of the assignment. I suggest
>>>>>>> omitting that from consideration at this stage.
>>>>>>>
>>>>>>> Donal.
>>>>>>>
>>>>>>> -------- Original message --------
>>>>>>> From: Colin Macleod via Tcl-Core <tcl...@li...>
>>>>>>> <tcl...@li...>
>>>>>>> Date: 02/11/2025 08:13 (GMT+00:00)
>>>>>>> To: Pietro Cerutti <ga...@ga...> <ga...@ga...>
>>>>>>> Cc: tcl...@li..., av...@lo...
>>>>>>> Subject: Re: [TCLCORE] Fwd: TIP 672 Implementation Complete - Ready
>>>>>>> for Sponsorship
>>>>>>>
>>>>>>> Indeed, this toy implementation doesn't handle that:
>>>>>>>
>>>>>>> % = sin (12)
>>>>>>> can't read "sin": no such variable
>>>>>>>
>>>>>>> I'm not sure that's serious, but it could be fixed in a C
>>>>>>> implementation.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Tcl-Core mailing list
>>>>>>> Tcl...@li...
>>>>>>> https://lists.sourceforge.net/lists/listinfo/tcl-core
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Tcl-Core mailing list
>>>>>> Tcl...@li...
>>>>>> https://lists.sourceforge.net/lists/listinfo/tcl-core
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Tcl-Core mailing list
>>>>>> Tcl...@li...
>>>>>> https://lists.sourceforge.net/lists/listinfo/tcl-core
>>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> Tcl-Core mailing lis...@li...://lists.sourceforge.net/lists/listinfo/tcl-core
>>>
>>> _______________________________________________
>>> Tcl-Core mailing list
>>> Tcl...@li...
>>> https://lists.sourceforge.net/lists/listinfo/tcl-core
>>>
>> _______________________________________________
> Tcl-Core mailing list
> Tcl...@li...
> https://lists.sourceforge.net/lists/listinfo/tcl-core
>
|