[TCLCORE] Variation on the Expr Shorthand

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi dear Tcl community,

An Expr shorthand syntax has been a long time demand between us.

Those discussions always focus on the syntax aspect of the subject :

  * Like in bash $(...) or $((...))
  * Through an alias [= ...]
  * A new command (vexpr or let)
  * A word prefix {=}
  * ...

A lot of TIPs exists on that matter. Numerous discussions occurs, which 
never ended to get a consensus.

That's because the look of this shorthand is a matter of taste. 
Everybody has his own taste. Some people like fish when it's cooked in 
water, some people like it when it's fried. Some people even don't like 
fish at all ! Every taste is in the nature.

Everybody can agree that Tcl is a big and complex machinery, that must 
be handled with care. So maybe the problem must be taken the other way 
round :

  * Shall we deduce the Tcl C source code machinery from a new syntax,
    we had previously decided (the one doesn't make consensus)
  * Or shall we deduce the new syntax from the Tcl C source code
    machinery, as it exists ?

My opinion is that it's better to deduce the syntax from the Tcl C 
source code, rather than to deduce the Tcl source C code from the syntax .

TIP 672 is hacking the variable substitution. To do this, it has to make 
a very basic parsing of the expression to estimate its length. It has to 
transmute a TCL_VARIABLE Token into a TCL_COMMAND token. It then use a 
call to Tcl_ParseCommand on a synthetic string to check errors.

This very basic parsing will make it buggy. For instance, a shorthand 
expression can't be nested in another one. A quote inside braces would 
create an error. To make this parsing strong, we would have to reinvent 
all the expression parsing from scratch.

But shall we create a new parsing expression routine for this shorthand 
?  No, there exist already an expression parsing machinery, that can 
handle words between Quotes or Braces and can handle Nested Commands, 
exactly how the Expr command do.

« Deduce the shorthand syntax from the Tcl C source Code » imply to find 
a syntax which allows us to use the existing machinery.

That's what I'm trying now :

As Expr is a command in Tcl, it seems logical to me to implement the 
shorthand syntax in the Command branch "[" of parseToken procedure. 
That's what I choosed.

The second step is to parse the expression, so to go through 
Tcl_ParseExpr routine, then to the ParseExpr routine. The difficulty 
here is to get the end of the substitution script in the ParseExpr 
routine. If I don't want to disturb parseExpr too much, it's better to 
choose, as character which ends the expression script, a character that 
is significant for this parser, so the main task of detecting it is 
already done, but can be adapted gently.

Maybe I could have used any of those operators : '+', '=', '-', '*', 
'(', ')', '|',...etc. But I choosed to use ')' : infix language needs 
parenthesis.

That is how I defined the end of the expression substitution script to 
be ")]". By symetry, I defined the beginning of the substitution script 
to be "[(".

Here is the genesis of my proposal of "[( ...)]" as a shorthand.

To make it work, I had to used the same clever hacking than Eric Taylor 
: create a synthethic string and parse it as a command.

At the end, the [(...)] is working as expected (so far I've tested). 
Here are the main changes I have done to accomplish it :

In file Tcl_Parse.c : in function parseTokens,  I add a new branch in 
the test

----------------------------------------

  ... } else if (src[0] == '[' && src[1] == '(') {
///////////////////////////////////////////////////////////////////////
         /* Expression substition context */
         // to do : noSubstExpr
         Tcl_Parse *exprParsePtr;
         exprParsePtr =(Tcl_Parse *)TclStackAlloc(parsePtr->interp, 
sizeof(Tcl_Parse));

         src++;     // src == '['
         numBytes --;
         // Use it only to know the length of the expression, and store 
it into exprParsePtr->commandSize
         Tcl_ParseExpr(parsePtr->interp, src, numBytes, exprParsePtr);

         src++;  // src == '('
         numBytes --;

         // Here is the famous hack of Eric Taylor
         Tcl_Size syntheticLen = exprParsePtr->commandSize + 9; // 
"[expr {" + expr + "}]"

         char *synthetic = (char *)Tcl_Alloc(syntheticLen + 1);

         memcpy(synthetic, "[expr {", 7);
         memcpy(synthetic + 7, src, exprParsePtr->commandSize);

         memcpy(synthetic + 7 + exprParsePtr->commandSize, "}]", 3);
         synthetic[syntheticLen] = '\0';
         // Maybe a Tcl_Obj could be of use for memory management ?

         Tcl_Obj *exprObjCommand = Tcl_NewStringObj(synthetic,syntheticLen);

         src+=exprParsePtr->commandSize+2;
         numBytes-=exprParsePtr->commandSize+2;

         TclStackFree(parsePtr->interp, exprParsePtr);

         tokenPtr->type = TCL_TOKEN_COMMAND;
         tokenPtr->start = Tcl_GetStringFromObj(exprObjCommand, NULL);
         tokenPtr->size = syntheticLen;
         parsePtr->numTokens++;

         continue;

} else if (*src == '[') {...

---------------------------------------

To detect the end and transfer the size of the parsed expression I had 
to modify :

1° the Tcl_ParseExpr function :

... if (code == TCL_OK) {
     if(start[-1] == '[' && start[0] == '(' ) {
         // Expression Substitution Context : just transfer the size 
information to the caller
              parsePtr->commandSize =exprParsePtr->commandSize;
     } else {
         TclParseInit(interp, start, numBytes, parsePtr);
         ConvertTreeToTokens(start, numBytes,
         opTree, exprParsePtr->tokenPtr, parsePtr);
     } ...

2° the ParseExpr fonction

int nb_paren=0;
int substExpressionContext=0;

if(start[-1] == '[' && start[0] == '(' ) {
     substExpressionContext=1;

     // Expression substitution
     start++;  //skip the open parenthesis '(' : it's part of the 
expression substitution syntax
     numBytes--;
}

...

case UNARY:

         //////////////////////////////////

         if (substExpressionContext == 1) {

             // Beyond binary operators, there is Open paren, count it

            if (start[0]== '(') {

                 // Count the open parenthesis in this context

                  nb_paren++;
            }
         }

case BINARY: {
         ...
         if (substExpressionContext == 1) {

             // Beyond binary operators, there is closed Paren, count it.

             if (start[0] == ')') {
                 nb_paren--;
                 if (nb_paren == -1 && start[1] ==']') {
                 //// End of expression
                     parsePtr->commandSize = originalLength - numBytes - 1;
                     numBytes=0;
                     continue; // and exit the loop, since numbytes == 0 ;)
                  }
              }
         }

----------------------------------------

I add also make it nestable, ie : set x [(1 + [(2+3)] )]

in the function Parse_Expr :

case SCRIPT : {

...

     if (start[1] == '(') {

             // an open braket followed by an open paren is denoting the 
expression shorthand

             tokenPtr->type = TCL_TOKEN_SUB_EXPR;
         } else {
             tokenPtr->type = TCL_TOKEN_COMMAND;
         }

...

In the function TclCompileTokens (file tclCompile.c), I add :

case TCL_TOKEN_SUB_EXPR :
         envPtr->line += adjust;
         TclCompileExpr(interp,  tokenPtr->start+1, tokenPtr->size-2, 
envPtr, 0);
         envPtr->line -= adjust;
         numObjsToConcat++;

         break;

---------------------

Then, I can write :

% set x [(1+1)]

2

% set y [($x + [(1 + 1)] )]

4

% set z [($y + [($x * [(1+1)] )] )]

8

-----------------------------

Surely there is corner cases that this prototype doesn't resolve. More 
investigations are needed and it should be extensively tested, but this 
prove that the [(...)] expression shorthand is possible at little cost. 
Maybe even the TCL_TOKEN_SUB_EXPR Token could be used instead of 
creating a synthetic string. I may investigate this las option later...

Florent

[TCLCORE] Variation on the Expr Shorthand

The Tool Command Language implementation

[TCLCORE] Variation on the Expr Shorthand