|
From: EricT <tw...@gm...> - 2025-10-29 19:18:14
|
Hi Florent,
Thank you for exploring alternative approaches to expression shorthand
syntax! It's valuable to have different perspectives, and I appreciate
the work you've put into your [(..)] prototype.
A Safety Note on Buffer Access:
I noticed a potential issue in your ParseTokens code:
} else if (src[0] == '[' && src[1] == '(') {
This should check numBytes before accessing src[1]:
} else if (numBytes > 1 && src[0] == '[' && src[1] == '(') {
ParseTokens is very sensitive to numBytes management. If numBytes goes
negative or you read past the buffer, you can encounter:
- Infinite loops (the while (numBytes && ...) continues with negative values)
- Reading garbage memory past null terminators (especially in proc bodies)
- Undefined behavior or crashes
I spent several hours debugging exactly this issue in mode 3 $((...))
when the second closing paren was missing at the end of a proc body.
The symptoms were subtle until numBytes reached -4000 in a tight loop
allocating tokens each pass.
Compatibility Consideration:
As Peter noted, [( has the same class of compatibility issue that $()
faces - both break rare but valid existing syntax. Worth considering
in the design.
A Question:
I'm curious - in earlier discussions, I thought you were exploring
[{...}] syntax. What led you to shift to [(..)]? The brace version
seemed interesting as an alternative approach.
Quote-in-Braces Issue:
You mentioned "a quote inside braces would create an error" as a flaw
in the basic parsing. Could you provide a specific test case that
demonstrates this? In my testing with $=:
% set foo $=([string length {text with "}])
11
This works correctly, since the paren-matching algorithm treats braces
as literal characters within expressions. Which of the three TIP 672
modes $( vs $= vs $(()) did you test? I'd like to understand what edge
case you've identified so I can verify and address it if needed.
Nested Expression Substitutions:
I noticed your nested example. I'm curious whether nesting is a
requirement of your implementation, or if you're demonstrating it as a
feature? With expression shorthand, nesting typically isn't necessary
since regular parentheses already provide grouping within expressions.
One of the design goals for TIP 672 was to keep the syntax simple by
not requiring nestability - the existing expression syntax handles
grouping naturally. Is there a use case where nestable expression
substitutions provide an advantage I'm missing?
Thanks again for contributing to this discussion!
Best regards,
Eric
On Wed, Oct 29, 2025 at 7:45 AM Florent Merlet <flo...@gm...>
wrote:
> Hi dear Tcl community,
>
> An Expr shorthand syntax has been a long time demand between us.
>
> Those discussions always focus on the syntax aspect of the subject :
>
> - Like in bash $(...) or $((...))
> - Through an alias [= ...]
> - A new command (vexpr or let)
> - A word prefix {=}
> - ...
>
> A lot of TIPs exists on that matter. Numerous discussions occurs, which
> never ended to get a consensus.
>
> That's because the look of this shorthand is a matter of taste. Everybody
> has his own taste. Some people like fish when it's cooked in water, some
> people like it when it's fried. Some people even don't like fish at all !
> Every taste is in the nature.
>
> Everybody can agree that Tcl is a big and complex machinery, that must be
> handled with care. So maybe the problem must be taken the other way round :
>
> - Shall we deduce the Tcl C source code machinery from a new syntax,
> we had previously decided (the one doesn't make consensus)
> - Or shall we deduce the new syntax from the Tcl C source code
> machinery, as it exists ?
>
> My opinion is that it's better to deduce the syntax from the Tcl C source
> code, rather than to deduce the Tcl source C code from the syntax .
>
> TIP 672 is hacking the variable substitution. To do this, it has to make a
> very basic parsing of the expression to estimate its length. It has to
> transmute a TCL_VARIABLE Token into a TCL_COMMAND token. It then use a call
> to Tcl_ParseCommand on a synthetic string to check errors.
>
> This very basic parsing will make it buggy. For instance, a shorthand
> expression can't be nested in another one. A quote inside braces would
> create an error. To make this parsing strong, we would have to reinvent all
> the expression parsing from scratch.
>
> But shall we create a new parsing expression routine for this shorthand
> ? No, there exist already an expression parsing machinery, that can handle
> words between Quotes or Braces and can handle Nested Commands, exactly how
> the Expr command do.
>
> « Deduce the shorthand syntax from the Tcl C source Code » imply to find a
> syntax which allows us to use the existing machinery.
>
> That's what I'm trying now :
>
> As Expr is a command in Tcl, it seems logical to me to implement the
> shorthand syntax in the Command branch "[" of parseToken procedure. That's
> what I choosed.
>
> The second step is to parse the expression, so to go through Tcl_ParseExpr
> routine, then to the ParseExpr routine. The difficulty here is to get the
> end of the substitution script in the ParseExpr routine. If I don't want to
> disturb parseExpr too much, it's better to choose, as character which ends
> the expression script, a character that is significant for this parser, so
> the main task of detecting it is already done, but can be adapted gently.
>
> Maybe I could have used any of those operators : '+', '=', '-', '*', '(',
> ')', '|',...etc. But I choosed to use ')' : infix language needs
> parenthesis.
>
> That is how I defined the end of the expression substitution script to be
> ")]". By symetry, I defined the beginning of the substitution script to be
> "[(".
>
> Here is the genesis of my proposal of "[( ...)]" as a shorthand.
>
> To make it work, I had to used the same clever hacking than Eric Taylor :
> create a synthethic string and parse it as a command.
>
> At the end, the [(...)] is working as expected (so far I've tested). Here
> are the main changes I have done to accomplish it :
>
> In file Tcl_Parse.c : in function parseTokens, I add a new branch in the
> test
>
> ----------------------------------------
>
> ... } else if (src[0] == '[' && src[1] == '(') {
>
> ///////////////////////////////////////////////////////////////////////
> /* Expression substition context */
> // to do : noSubstExpr
> Tcl_Parse *exprParsePtr;
> exprParsePtr =(Tcl_Parse *)TclStackAlloc(parsePtr->interp,
> sizeof(Tcl_Parse));
>
> src++; // src == '['
> numBytes --;
> // Use it only to know the length of the expression, and store it
> into exprParsePtr->commandSize
> Tcl_ParseExpr(parsePtr->interp, src, numBytes, exprParsePtr);
>
> src++; // src == '('
> numBytes --;
>
> // Here is the famous hack of Eric Taylor
> Tcl_Size syntheticLen = exprParsePtr->commandSize + 9; // "[expr
> {" + expr + "}]"
>
> char *synthetic = (char *)Tcl_Alloc(syntheticLen + 1);
>
> memcpy(synthetic, "[expr {", 7);
> memcpy(synthetic + 7, src, exprParsePtr->commandSize);
>
> memcpy(synthetic + 7 + exprParsePtr->commandSize, "}]", 3);
> synthetic[syntheticLen] = '\0';
> // Maybe a Tcl_Obj could be of use for memory management ?
>
>
> Tcl_Obj *exprObjCommand = Tcl_NewStringObj(synthetic,syntheticLen);
>
> src+=exprParsePtr->commandSize+2;
> numBytes-=exprParsePtr->commandSize+2;
>
> TclStackFree(parsePtr->interp, exprParsePtr);
>
> tokenPtr->type = TCL_TOKEN_COMMAND;
> tokenPtr->start = Tcl_GetStringFromObj(exprObjCommand, NULL);
> tokenPtr->size = syntheticLen;
> parsePtr->numTokens++;
>
> continue;
>
> } else if (*src == '[') {...
>
> ---------------------------------------
>
> To detect the end and transfer the size of the parsed expression I had to
> modify :
>
> 1° the Tcl_ParseExpr function :
>
> ... if (code == TCL_OK) {
> if(start[-1] == '[' && start[0] == '(' ) {
> // Expression Substitution Context : just transfer the size
> information to the caller
> parsePtr->commandSize =exprParsePtr->commandSize;
> } else {
> TclParseInit(interp, start, numBytes, parsePtr);
> ConvertTreeToTokens(start, numBytes,
> opTree, exprParsePtr->tokenPtr, parsePtr);
> } ...
>
> 2° the ParseExpr fonction
>
> int nb_paren=0;
> int substExpressionContext=0;
>
> if(start[-1] == '[' && start[0] == '(' ) {
> substExpressionContext=1;
>
> // Expression substitution
> start++; //skip the open parenthesis '(' : it's part of the
> expression substitution syntax
> numBytes--;
> }
>
> ...
>
> case UNARY:
>
> //////////////////////////////////
>
> if (substExpressionContext == 1) {
>
> // Beyond binary operators, there is Open paren, count it
>
> if (start[0]== '(') {
>
> // Count the open parenthesis in this context
>
> nb_paren++;
> }
> }
>
> case BINARY: {
> ...
> if (substExpressionContext == 1) {
>
> // Beyond binary operators, there is closed Paren, count it.
>
> if (start[0] == ')') {
> nb_paren--;
> if (nb_paren == -1 && start[1] ==']') {
> //// End of expression
> parsePtr->commandSize = originalLength - numBytes - 1;
> numBytes=0;
> continue; // and exit the loop, since numbytes == 0 ;)
> }
> }
> }
>
> ----------------------------------------
>
> I add also make it nestable, ie : set x [(1 + [(2+3)] )]
>
> in the function Parse_Expr :
>
> case SCRIPT : {
>
> ...
>
> if (start[1] == '(') {
>
> // an open braket followed by an open paren is denoting the
> expression shorthand
>
> tokenPtr->type = TCL_TOKEN_SUB_EXPR;
> } else {
> tokenPtr->type = TCL_TOKEN_COMMAND;
> }
>
> ...
>
> In the function TclCompileTokens (file tclCompile.c), I add :
>
> case TCL_TOKEN_SUB_EXPR :
> envPtr->line += adjust;
> TclCompileExpr(interp, tokenPtr->start+1, tokenPtr->size-2,
> envPtr, 0);
> envPtr->line -= adjust;
> numObjsToConcat++;
>
> break;
>
> ---------------------
>
> Then, I can write :
>
> % set x [(1+1)]
>
> 2
>
> % set y [($x + [(1 + 1)] )]
>
> 4
>
> % set z [($y + [($x * [(1+1)] )] )]
>
> 8
>
> -----------------------------
>
> Surely there is corner cases that this prototype doesn't resolve. More
> investigations are needed and it should be extensively tested, but this
> prove that the [(...)] expression shorthand is possible at little cost.
> Maybe even the TCL_TOKEN_SUB_EXPR Token could be used instead of creating a
> synthetic string. I may investigate this las option later...
>
> Florent
>
>
> _______________________________________________
> Tcl-Core mailing list
> Tcl...@li...
> https://lists.sourceforge.net/lists/listinfo/tcl-core
>
|