|
From: Florent M. <flo...@gm...> - 2025-10-29 14:44:46
|
Hi dear Tcl community,
An Expr shorthand syntax has been a long time demand between us.
Those discussions always focus on the syntax aspect of the subject :
* Like in bash $(...) or $((...))
* Through an alias [= ...]
* A new command (vexpr or let)
* A word prefix {=}
* ...
A lot of TIPs exists on that matter. Numerous discussions occurs, which
never ended to get a consensus.
That's because the look of this shorthand is a matter of taste.
Everybody has his own taste. Some people like fish when it's cooked in
water, some people like it when it's fried. Some people even don't like
fish at all ! Every taste is in the nature.
Everybody can agree that Tcl is a big and complex machinery, that must
be handled with care. So maybe the problem must be taken the other way
round :
* Shall we deduce the Tcl C source code machinery from a new syntax,
we had previously decided (the one doesn't make consensus)
* Or shall we deduce the new syntax from the Tcl C source code
machinery, as it exists ?
My opinion is that it's better to deduce the syntax from the Tcl C
source code, rather than to deduce the Tcl source C code from the syntax .
TIP 672 is hacking the variable substitution. To do this, it has to make
a very basic parsing of the expression to estimate its length. It has to
transmute a TCL_VARIABLE Token into a TCL_COMMAND token. It then use a
call to Tcl_ParseCommand on a synthetic string to check errors.
This very basic parsing will make it buggy. For instance, a shorthand
expression can't be nested in another one. A quote inside braces would
create an error. To make this parsing strong, we would have to reinvent
all the expression parsing from scratch.
But shall we create a new parsing expression routine for this shorthand
? No, there exist already an expression parsing machinery, that can
handle words between Quotes or Braces and can handle Nested Commands,
exactly how the Expr command do.
« Deduce the shorthand syntax from the Tcl C source Code » imply to find
a syntax which allows us to use the existing machinery.
That's what I'm trying now :
As Expr is a command in Tcl, it seems logical to me to implement the
shorthand syntax in the Command branch "[" of parseToken procedure.
That's what I choosed.
The second step is to parse the expression, so to go through
Tcl_ParseExpr routine, then to the ParseExpr routine. The difficulty
here is to get the end of the substitution script in the ParseExpr
routine. If I don't want to disturb parseExpr too much, it's better to
choose, as character which ends the expression script, a character that
is significant for this parser, so the main task of detecting it is
already done, but can be adapted gently.
Maybe I could have used any of those operators : '+', '=', '-', '*',
'(', ')', '|',...etc. But I choosed to use ')' : infix language needs
parenthesis.
That is how I defined the end of the expression substitution script to
be ")]". By symetry, I defined the beginning of the substitution script
to be "[(".
Here is the genesis of my proposal of "[( ...)]" as a shorthand.
To make it work, I had to used the same clever hacking than Eric Taylor
: create a synthethic string and parse it as a command.
At the end, the [(...)] is working as expected (so far I've tested).
Here are the main changes I have done to accomplish it :
In file Tcl_Parse.c : in function parseTokens, I add a new branch in
the test
----------------------------------------
... } else if (src[0] == '[' && src[1] == '(') {
///////////////////////////////////////////////////////////////////////
/* Expression substition context */
// to do : noSubstExpr
Tcl_Parse *exprParsePtr;
exprParsePtr =(Tcl_Parse *)TclStackAlloc(parsePtr->interp,
sizeof(Tcl_Parse));
src++; // src == '['
numBytes --;
// Use it only to know the length of the expression, and store
it into exprParsePtr->commandSize
Tcl_ParseExpr(parsePtr->interp, src, numBytes, exprParsePtr);
src++; // src == '('
numBytes --;
// Here is the famous hack of Eric Taylor
Tcl_Size syntheticLen = exprParsePtr->commandSize + 9; //
"[expr {" + expr + "}]"
char *synthetic = (char *)Tcl_Alloc(syntheticLen + 1);
memcpy(synthetic, "[expr {", 7);
memcpy(synthetic + 7, src, exprParsePtr->commandSize);
memcpy(synthetic + 7 + exprParsePtr->commandSize, "}]", 3);
synthetic[syntheticLen] = '\0';
// Maybe a Tcl_Obj could be of use for memory management ?
Tcl_Obj *exprObjCommand = Tcl_NewStringObj(synthetic,syntheticLen);
src+=exprParsePtr->commandSize+2;
numBytes-=exprParsePtr->commandSize+2;
TclStackFree(parsePtr->interp, exprParsePtr);
tokenPtr->type = TCL_TOKEN_COMMAND;
tokenPtr->start = Tcl_GetStringFromObj(exprObjCommand, NULL);
tokenPtr->size = syntheticLen;
parsePtr->numTokens++;
continue;
} else if (*src == '[') {...
---------------------------------------
To detect the end and transfer the size of the parsed expression I had
to modify :
1° the Tcl_ParseExpr function :
... if (code == TCL_OK) {
if(start[-1] == '[' && start[0] == '(' ) {
// Expression Substitution Context : just transfer the size
information to the caller
parsePtr->commandSize =exprParsePtr->commandSize;
} else {
TclParseInit(interp, start, numBytes, parsePtr);
ConvertTreeToTokens(start, numBytes,
opTree, exprParsePtr->tokenPtr, parsePtr);
} ...
2° the ParseExpr fonction
int nb_paren=0;
int substExpressionContext=0;
if(start[-1] == '[' && start[0] == '(' ) {
substExpressionContext=1;
// Expression substitution
start++; //skip the open parenthesis '(' : it's part of the
expression substitution syntax
numBytes--;
}
...
case UNARY:
//////////////////////////////////
if (substExpressionContext == 1) {
// Beyond binary operators, there is Open paren, count it
if (start[0]== '(') {
// Count the open parenthesis in this context
nb_paren++;
}
}
case BINARY: {
...
if (substExpressionContext == 1) {
// Beyond binary operators, there is closed Paren, count it.
if (start[0] == ')') {
nb_paren--;
if (nb_paren == -1 && start[1] ==']') {
//// End of expression
parsePtr->commandSize = originalLength - numBytes - 1;
numBytes=0;
continue; // and exit the loop, since numbytes == 0 ;)
}
}
}
----------------------------------------
I add also make it nestable, ie : set x [(1 + [(2+3)] )]
in the function Parse_Expr :
case SCRIPT : {
...
if (start[1] == '(') {
// an open braket followed by an open paren is denoting the
expression shorthand
tokenPtr->type = TCL_TOKEN_SUB_EXPR;
} else {
tokenPtr->type = TCL_TOKEN_COMMAND;
}
...
In the function TclCompileTokens (file tclCompile.c), I add :
case TCL_TOKEN_SUB_EXPR :
envPtr->line += adjust;
TclCompileExpr(interp, tokenPtr->start+1, tokenPtr->size-2,
envPtr, 0);
envPtr->line -= adjust;
numObjsToConcat++;
break;
---------------------
Then, I can write :
% set x [(1+1)]
2
% set y [($x + [(1 + 1)] )]
4
% set z [($y + [($x * [(1+1)] )] )]
8
-----------------------------
Surely there is corner cases that this prototype doesn't resolve. More
investigations are needed and it should be extensively tested, but this
prove that the [(...)] expression shorthand is possible at little cost.
Maybe even the TCL_TOKEN_SUB_EXPR Token could be used instead of
creating a synthetic string. I may investigate this las option later...
Florent
|