|
From: da S. P. J <pet...@fl...> - 2025-10-29 14:56:46
|
That has the same problem of breaking existing syntax.
% proc ( {args} { puts $args }
% ( hello
hello
From: Florent Merlet <flo...@gm...>
Date: Wednesday, October 29, 2025 at 09:45
To: tcl...@li... <tcl...@li...>
Subject: [TCLCORE] Variation on the Expr Shorthand
CAUTION - EXTERNAL EMAIL:
This message originated from outside of your organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi dear Tcl community,
An Expr shorthand syntax has been a long time demand between us.
Those discussions always focus on the syntax aspect of the subject :
* Like in bash $(...) or $((...))
* Through an alias [= ...]
* A new command (vexpr or let)
* A word prefix {=}
* ...
A lot of TIPs exists on that matter. Numerous discussions occurs, which never ended to get a consensus.
That's because the look of this shorthand is a matter of taste. Everybody has his own taste. Some people like fish when it's cooked in water, some people like it when it's fried. Some people even don't like fish at all ! Every taste is in the nature.
Everybody can agree that Tcl is a big and complex machinery, that must be handled with care. So maybe the problem must be taken the other way round :
* Shall we deduce the Tcl C source code machinery from a new syntax, we had previously decided (the one doesn't make consensus)
* Or shall we deduce the new syntax from the Tcl C source code machinery, as it exists ?
My opinion is that it's better to deduce the syntax from the Tcl C source code, rather than to deduce the Tcl source C code from the syntax .
TIP 672 is hacking the variable substitution. To do this, it has to make a very basic parsing of the expression to estimate its length. It has to transmute a TCL_VARIABLE Token into a TCL_COMMAND token. It then use a call to Tcl_ParseCommand on a synthetic string to check errors.
This very basic parsing will make it buggy. For instance, a shorthand expression can't be nested in another one. A quote inside braces would create an error. To make this parsing strong, we would have to reinvent all the expression parsing from scratch.
But shall we create a new parsing expression routine for this shorthand ? No, there exist already an expression parsing machinery, that can handle words between Quotes or Braces and can handle Nested Commands, exactly how the Expr command do.
« Deduce the shorthand syntax from the Tcl C source Code » imply to find a syntax which allows us to use the existing machinery.
That's what I'm trying now :
As Expr is a command in Tcl, it seems logical to me to implement the shorthand syntax in the Command branch "[" of parseToken procedure. That's what I choosed.
The second step is to parse the expression, so to go through Tcl_ParseExpr routine, then to the ParseExpr routine. The difficulty here is to get the end of the substitution script in the ParseExpr routine. If I don't want to disturb parseExpr too much, it's better to choose, as character which ends the expression script, a character that is significant for this parser, so the main task of detecting it is already done, but can be adapted gently.
Maybe I could have used any of those operators : '+', '=', '-', '*', '(', ')', '|',...etc. But I choosed to use ')' : infix language needs parenthesis.
That is how I defined the end of the expression substitution script to be ")]". By symetry, I defined the beginning of the substitution script to be "[(".
Here is the genesis of my proposal of "[( ...)]" as a shorthand.
To make it work, I had to used the same clever hacking than Eric Taylor : create a synthethic string and parse it as a command.
At the end, the [(...)] is working as expected (so far I've tested). Here are the main changes I have done to accomplish it :
In file Tcl_Parse.c : in function parseTokens, I add a new branch in the test
----------------------------------------
... } else if (src[0] == '[' && src[1] == '(') {
///////////////////////////////////////////////////////////////////////
/* Expression substition context */
// to do : noSubstExpr
Tcl_Parse *exprParsePtr;
exprParsePtr =(Tcl_Parse *)TclStackAlloc(parsePtr->interp, sizeof(Tcl_Parse));
src++; // src == '['
numBytes --;
// Use it only to know the length of the expression, and store it into exprParsePtr->commandSize
Tcl_ParseExpr(parsePtr->interp, src, numBytes, exprParsePtr);
src++; // src == '('
numBytes --;
// Here is the famous hack of Eric Taylor
Tcl_Size syntheticLen = exprParsePtr->commandSize + 9; // "[expr {" + expr + "}]"
char *synthetic = (char *)Tcl_Alloc(syntheticLen + 1);
memcpy(synthetic, "[expr {", 7);
memcpy(synthetic + 7, src, exprParsePtr->commandSize);
memcpy(synthetic + 7 + exprParsePtr->commandSize, "}]", 3);
synthetic[syntheticLen] = '\0';
// Maybe a Tcl_Obj could be of use for memory management ?
Tcl_Obj *exprObjCommand = Tcl_NewStringObj(synthetic,syntheticLen);
src+=exprParsePtr->commandSize+2;
numBytes-=exprParsePtr->commandSize+2;
TclStackFree(parsePtr->interp, exprParsePtr);
tokenPtr->type = TCL_TOKEN_COMMAND;
tokenPtr->start = Tcl_GetStringFromObj(exprObjCommand, NULL);
tokenPtr->size = syntheticLen;
parsePtr->numTokens++;
continue;
} else if (*src == '[') {...
---------------------------------------
To detect the end and transfer the size of the parsed expression I had to modify :
1° the Tcl_ParseExpr function :
... if (code == TCL_OK) {
if(start[-1] == '[' && start[0] == '(' ) {
// Expression Substitution Context : just transfer the size information to the caller
parsePtr->commandSize =exprParsePtr->commandSize;
} else {
TclParseInit(interp, start, numBytes, parsePtr);
ConvertTreeToTokens(start, numBytes,
opTree, exprParsePtr->tokenPtr, parsePtr);
} ...
2° the ParseExpr fonction
int nb_paren=0;
int substExpressionContext=0;
if(start[-1] == '[' && start[0] == '(' ) {
substExpressionContext=1;
// Expression substitution
start++; //skip the open parenthesis '(' : it's part of the expression substitution syntax
numBytes--;
}
...
case UNARY:
//////////////////////////////////
if (substExpressionContext == 1) {
// Beyond binary operators, there is Open paren, count it
if (start[0]== '(') {
// Count the open parenthesis in this context
nb_paren++;
}
}
case BINARY: {
...
if (substExpressionContext == 1) {
// Beyond binary operators, there is closed Paren, count it.
if (start[0] == ')') {
nb_paren--;
if (nb_paren == -1 && start[1] ==']') {
//// End of expression
parsePtr->commandSize = originalLength - numBytes - 1;
numBytes=0;
continue; // and exit the loop, since numbytes == 0 ;)
}
}
}
----------------------------------------
I add also make it nestable, ie : set x [(1 + [(2+3)] )]
in the function Parse_Expr :
case SCRIPT : {
...
if (start[1] == '(') {
// an open braket followed by an open paren is denoting the expression shorthand
tokenPtr->type = TCL_TOKEN_SUB_EXPR;
} else {
tokenPtr->type = TCL_TOKEN_COMMAND;
}
...
In the function TclCompileTokens (file tclCompile.c), I add :
case TCL_TOKEN_SUB_EXPR :
envPtr->line += adjust;
TclCompileExpr(interp, tokenPtr->start+1, tokenPtr->size-2, envPtr, 0);
envPtr->line -= adjust;
numObjsToConcat++;
break;
---------------------
Then, I can write :
% set x [(1+1)]
2
% set y [($x + [(1 + 1)] )]
4
% set z [($y + [($x * [(1+1)] )] )]
8
-----------------------------
Surely there is corner cases that this prototype doesn't resolve. More investigations are needed and it should be extensively tested, but this prove that the [(...)] expression shorthand is possible at little cost. Maybe even the TCL_TOKEN_SUB_EXPR Token could be used instead of creating a synthetic string. I may investigate this las option later...
Florent
|