|
From: Florent M. <flo...@gm...> - 2025-10-29 21:04:16
|
Hi Eric.
Thanks for your comment.
I can see in effect some tests in my code are weak. I dont have enough time
now to perfect it. At least, i wanted to produce a prototype to check if
the idea is feasible. i wanted to create a cleaner github implementation
later, beginning 2026. The actual code is full of printf.
Your comments on the subject are really interesting. I will keep it in mind.
I notice also that (in my try) it doesn't working in array index.
% set x([(1+1)]) 1
Won't set x(2). I dont understand why. It should have gone throught
parseToken, it did, but not only.
I've test only with tclsh. Sadly, i will be too busy next month to
investigate it.
About your parsing routine : i supposed it WOULD have problem with quote in
brace or brace in quote, but this guess was not founded on factuel test. I
may be wrong.
Nevertheless, using the expr parser seems more natural to me.
My first interest was with {=} prefix, till i understood it was limited to
a word. That's why i shift to try an [( ... )] syntax.
Of course, [( .... )] will break any code that has defined a proc ( args
{...}.
Myself, i always prefer to give to a command a name that correspond to its
semantic. So, i never gave the name "(" to any proc. Who can advice it's a
good name to give ? But it's a matter of taste...
Nesting subexpression :
Writing it is of course totally unsane. Why someone would write [( 3 * [(2
+ 1)] )] instead of [(3*(2+1))] ?
I can be only accidental. Either due to a programmer who modified an
existing code, in a very long expression, or due to a machine who generated
it.
Support those accidents seems important to me.
Whatever, to add this possibility has implied less than 10 lines of code in
this prototype.
Best regards,
Florent
Le mer. 29 oct. 2025, 20:18, EricT <tw...@gm...> a écrit :
> Hi Florent,
>
> Thank you for exploring alternative approaches to expression shorthand syntax! It's valuable to have different perspectives, and I appreciate the work you've put into your [(..)] prototype.
>
> A Safety Note on Buffer Access:
>
> I noticed a potential issue in your ParseTokens code:
>
> } else if (src[0] == '[' && src[1] == '(') {
>
> This should check numBytes before accessing src[1]:
>
> } else if (numBytes > 1 && src[0] == '[' && src[1] == '(') {
>
> ParseTokens is very sensitive to numBytes management. If numBytes goes negative or you read past the buffer, you can encounter:
>
> - Infinite loops (the while (numBytes && ...) continues with negative values)
> - Reading garbage memory past null terminators (especially in proc bodies)
> - Undefined behavior or crashes
>
> I spent several hours debugging exactly this issue in mode 3 $((...)) when the second closing paren was missing at the end of a proc body. The symptoms were subtle until numBytes reached -4000 in a tight loop allocating tokens each pass.
>
> Compatibility Consideration:
>
> As Peter noted, [( has the same class of compatibility issue that $() faces - both break rare but valid existing syntax. Worth considering in the design.
>
> A Question:
>
> I'm curious - in earlier discussions, I thought you were exploring [{...}] syntax. What led you to shift to [(..)]? The brace version seemed interesting as an alternative approach.
>
> Quote-in-Braces Issue:
>
> You mentioned "a quote inside braces would create an error" as a flaw in the basic parsing. Could you provide a specific test case that demonstrates this? In my testing with $=:
>
> % set foo $=([string length {text with "}])
> 11
>
> This works correctly, since the paren-matching algorithm treats braces as literal characters within expressions. Which of the three TIP 672 modes $( vs $= vs $(()) did you test? I'd like to understand what edge case you've identified so I can verify and address it if needed.
>
>
> Nested Expression Substitutions:
>
> I noticed your nested example. I'm curious whether nesting is a requirement of your implementation, or if you're demonstrating it as a feature? With expression shorthand, nesting typically isn't necessary since regular parentheses already provide grouping within expressions.
>
> One of the design goals for TIP 672 was to keep the syntax simple by not requiring nestability - the existing expression syntax handles grouping naturally. Is there a use case where nestable expression substitutions provide an advantage I'm missing?
>
> Thanks again for contributing to this discussion!
>
> Best regards,
> Eric
>
>
>
> On Wed, Oct 29, 2025 at 7:45 AM Florent Merlet <flo...@gm...>
> wrote:
>
>> Hi dear Tcl community,
>>
>> An Expr shorthand syntax has been a long time demand between us.
>>
>> Those discussions always focus on the syntax aspect of the subject :
>>
>> - Like in bash $(...) or $((...))
>> - Through an alias [= ...]
>> - A new command (vexpr or let)
>> - A word prefix {=}
>> - ...
>>
>> A lot of TIPs exists on that matter. Numerous discussions occurs, which
>> never ended to get a consensus.
>>
>> That's because the look of this shorthand is a matter of taste. Everybody
>> has his own taste. Some people like fish when it's cooked in water, some
>> people like it when it's fried. Some people even don't like fish at all !
>> Every taste is in the nature.
>>
>> Everybody can agree that Tcl is a big and complex machinery, that must be
>> handled with care. So maybe the problem must be taken the other way round :
>>
>> - Shall we deduce the Tcl C source code machinery from a new syntax,
>> we had previously decided (the one doesn't make consensus)
>> - Or shall we deduce the new syntax from the Tcl C source code
>> machinery, as it exists ?
>>
>> My opinion is that it's better to deduce the syntax from the Tcl C source
>> code, rather than to deduce the Tcl source C code from the syntax .
>>
>> TIP 672 is hacking the variable substitution. To do this, it has to make
>> a very basic parsing of the expression to estimate its length. It has to
>> transmute a TCL_VARIABLE Token into a TCL_COMMAND token. It then use a call
>> to Tcl_ParseCommand on a synthetic string to check errors.
>>
>> This very basic parsing will make it buggy. For instance, a shorthand
>> expression can't be nested in another one. A quote inside braces would
>> create an error. To make this parsing strong, we would have to reinvent all
>> the expression parsing from scratch.
>>
>> But shall we create a new parsing expression routine for this shorthand
>> ? No, there exist already an expression parsing machinery, that can handle
>> words between Quotes or Braces and can handle Nested Commands, exactly how
>> the Expr command do.
>>
>> « Deduce the shorthand syntax from the Tcl C source Code » imply to find
>> a syntax which allows us to use the existing machinery.
>>
>> That's what I'm trying now :
>>
>> As Expr is a command in Tcl, it seems logical to me to implement the
>> shorthand syntax in the Command branch "[" of parseToken procedure. That's
>> what I choosed.
>>
>> The second step is to parse the expression, so to go through
>> Tcl_ParseExpr routine, then to the ParseExpr routine. The difficulty here
>> is to get the end of the substitution script in the ParseExpr routine. If I
>> don't want to disturb parseExpr too much, it's better to choose, as
>> character which ends the expression script, a character that
>> is significant for this parser, so the main task of detecting it is already
>> done, but can be adapted gently.
>>
>> Maybe I could have used any of those operators : '+', '=', '-', '*', '(',
>> ')', '|',...etc. But I choosed to use ')' : infix language needs
>> parenthesis.
>>
>> That is how I defined the end of the expression substitution script to be
>> ")]". By symetry, I defined the beginning of the substitution script to be
>> "[(".
>>
>> Here is the genesis of my proposal of "[( ...)]" as a shorthand.
>>
>> To make it work, I had to used the same clever hacking than Eric Taylor :
>> create a synthethic string and parse it as a command.
>>
>> At the end, the [(...)] is working as expected (so far I've tested). Here
>> are the main changes I have done to accomplish it :
>>
>> In file Tcl_Parse.c : in function parseTokens, I add a new branch in the
>> test
>>
>> ----------------------------------------
>>
>> ... } else if (src[0] == '[' && src[1] == '(') {
>>
>> ///////////////////////////////////////////////////////////////////////
>> /* Expression substition context */
>> // to do : noSubstExpr
>> Tcl_Parse *exprParsePtr;
>> exprParsePtr =(Tcl_Parse *)TclStackAlloc(parsePtr->interp,
>> sizeof(Tcl_Parse));
>>
>> src++; // src == '['
>> numBytes --;
>> // Use it only to know the length of the expression, and store it
>> into exprParsePtr->commandSize
>> Tcl_ParseExpr(parsePtr->interp, src, numBytes, exprParsePtr);
>>
>> src++; // src == '('
>> numBytes --;
>>
>> // Here is the famous hack of Eric Taylor
>> Tcl_Size syntheticLen = exprParsePtr->commandSize + 9; // "[expr
>> {" + expr + "}]"
>>
>> char *synthetic = (char *)Tcl_Alloc(syntheticLen + 1);
>>
>> memcpy(synthetic, "[expr {", 7);
>> memcpy(synthetic + 7, src, exprParsePtr->commandSize);
>>
>> memcpy(synthetic + 7 + exprParsePtr->commandSize, "}]", 3);
>> synthetic[syntheticLen] = '\0';
>> // Maybe a Tcl_Obj could be of use for memory management ?
>>
>>
>> Tcl_Obj *exprObjCommand =
>> Tcl_NewStringObj(synthetic,syntheticLen);
>>
>> src+=exprParsePtr->commandSize+2;
>> numBytes-=exprParsePtr->commandSize+2;
>>
>> TclStackFree(parsePtr->interp, exprParsePtr);
>>
>> tokenPtr->type = TCL_TOKEN_COMMAND;
>> tokenPtr->start = Tcl_GetStringFromObj(exprObjCommand, NULL);
>> tokenPtr->size = syntheticLen;
>> parsePtr->numTokens++;
>>
>> continue;
>>
>> } else if (*src == '[') {...
>>
>> ---------------------------------------
>>
>> To detect the end and transfer the size of the parsed expression I had to
>> modify :
>>
>> 1° the Tcl_ParseExpr function :
>>
>> ... if (code == TCL_OK) {
>> if(start[-1] == '[' && start[0] == '(' ) {
>> // Expression Substitution Context : just transfer the size
>> information to the caller
>> parsePtr->commandSize =exprParsePtr->commandSize;
>> } else {
>> TclParseInit(interp, start, numBytes, parsePtr);
>> ConvertTreeToTokens(start, numBytes,
>> opTree, exprParsePtr->tokenPtr, parsePtr);
>> } ...
>>
>> 2° the ParseExpr fonction
>>
>> int nb_paren=0;
>> int substExpressionContext=0;
>>
>> if(start[-1] == '[' && start[0] == '(' ) {
>> substExpressionContext=1;
>>
>> // Expression substitution
>> start++; //skip the open parenthesis '(' : it's part of the
>> expression substitution syntax
>> numBytes--;
>> }
>>
>> ...
>>
>> case UNARY:
>>
>> //////////////////////////////////
>>
>> if (substExpressionContext == 1) {
>>
>> // Beyond binary operators, there is Open paren, count it
>>
>> if (start[0]== '(') {
>>
>> // Count the open parenthesis in this context
>>
>> nb_paren++;
>> }
>> }
>>
>> case BINARY: {
>> ...
>> if (substExpressionContext == 1) {
>>
>> // Beyond binary operators, there is closed Paren, count it.
>>
>> if (start[0] == ')') {
>> nb_paren--;
>> if (nb_paren == -1 && start[1] ==']') {
>> //// End of expression
>> parsePtr->commandSize = originalLength - numBytes - 1;
>> numBytes=0;
>> continue; // and exit the loop, since numbytes == 0 ;)
>> }
>> }
>> }
>>
>> ----------------------------------------
>>
>> I add also make it nestable, ie : set x [(1 + [(2+3)] )]
>>
>> in the function Parse_Expr :
>>
>> case SCRIPT : {
>>
>> ...
>>
>> if (start[1] == '(') {
>>
>> // an open braket followed by an open paren is denoting the
>> expression shorthand
>>
>> tokenPtr->type = TCL_TOKEN_SUB_EXPR;
>> } else {
>> tokenPtr->type = TCL_TOKEN_COMMAND;
>> }
>>
>> ...
>>
>> In the function TclCompileTokens (file tclCompile.c), I add :
>>
>> case TCL_TOKEN_SUB_EXPR :
>> envPtr->line += adjust;
>> TclCompileExpr(interp, tokenPtr->start+1, tokenPtr->size-2,
>> envPtr, 0);
>> envPtr->line -= adjust;
>> numObjsToConcat++;
>>
>> break;
>>
>> ---------------------
>>
>> Then, I can write :
>>
>> % set x [(1+1)]
>>
>> 2
>>
>> % set y [($x + [(1 + 1)] )]
>>
>> 4
>>
>> % set z [($y + [($x * [(1+1)] )] )]
>>
>> 8
>>
>> -----------------------------
>>
>> Surely there is corner cases that this prototype doesn't resolve. More
>> investigations are needed and it should be extensively tested, but this
>> prove that the [(...)] expression shorthand is possible at little cost.
>> Maybe even the TCL_TOKEN_SUB_EXPR Token could be used instead of creating a
>> synthetic string. I may investigate this las option later...
>>
>> Florent
>>
>>
>> _______________________________________________
>> Tcl-Core mailing list
>> Tcl...@li...
>> https://lists.sourceforge.net/lists/listinfo/tcl-core
>>
>
|