[Tcl-bugs] [ tcl-Feature Requests-1517602 ] long expressions and Tcl_Tokens

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Feature Requests item #1517602, was opened at 2006-07-05 11:52
Message generated for change (Comment added) made by dgp
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=360894&aid=1517602&group_id=10894

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: 44. Parsing and Eval
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 2
Private: No
Submitted By: Don Porter (dgp)
Assigned to: Don Porter (dgp)
Summary: long expressions and Tcl_Tokens

Initial Comment:

The array of Tcl_Tokens that
Tcl's parser generates has
several design flaws when used
to parse long expressions.

consider the simple expression
"1+1".  It parses into the
Tcl_Token stream:

subexpr 1+1 5
operator + 0
subexpr 1 1
text 1 0
subexpr 1 1
text 1 0

That's six Tcl_Tokens to
represent a 3 character
expression.  At 16 bytes
each (more on 64-bit system),
that's a considerable bloat
of memory.  Apply that to

expr [string repeat +1 10000]

and you need 640Kbytes just
to hold the final Tcl_Token array
for a 10Kbyte expression.

Worse, you need that 640KBytes
as one contiguous block of memory,
so even if you might have the RAM
available, you can panic if its
not available in a large enough block.

A fairly straightforward simplification
to an RPN style token stream would give:

operator + 2
text 1 0
text 1 0

The expr parser already adds TCL_TOKEN_WORD
tokens to flag multi-token operands.  There's
no apparent need for the TCL_TOKEN_SUB_EXPR
to mark the start of operands.  The other
value TCL_TOKEN_SUB_EXPR offers is to keep 
the substring "1+1" distinct from the 
substring "+" in the TCL_TOKEN_OPERATOR
token or the substring "{foo}" distinct
from the substring "foo".  The main utility
of that, though, is to have the string rep
of the whole subexpression handy for error
messages or debug trace messages.  Makes
more sense to me to have such messages take
longer to generate, with the tradeoff that
larger legal expressions are usable.

One other issue where Tcl_Tokens don't
serve expressions well.  In order to
parse the boundaries of literal string
representations of numbers, TclParseNumber()
determines the numeric value.  This is
simply discarded because Tcl_Tokens only
store strings.  Later, when the expression
is actually evaluated, the string form of
the number has to be parsed all over again
to get the numeric value.  A revision
that permitted passing through a Tcl_Obj
would be nice.  An additional Tcl_Token
type might permit that.

The problem with most of these ideas
is that the current design for the
Tcl_Token array representation of
expressions is documented as part
of the Tcl_ParseExpr() routine.  If
we make any of these changes, we'll
at least have to build a converter
to the old layout into T_PE.

----------------------------------------------------------------------

>Comment By: Don Porter (dgp)
Date: 2006-12-12 16:45

Message:
Logged In: YES 
user_id=80530
Originator: YES

strategies now in place
on the HEAD.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2006-08-31 23:23

Message:
Logged In: YES 
user_id=80530

Oh, another bit of data lost
in the reduction of a parsed
Tcl expression to a sequence
of Tcl_Tokens...

The expr parser determines the
difference between unary and
binary + and - operators.
However, the Tcl_Token carries
only the string rep of the
operator, so the expr compiler
is forced to decide all over again
which + and - operators are binary
and which are unary in order to
compile to the right bytecodes.
A revised storage strategy could
avoid this loss and recomputation
of information.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2006-08-31 23:20

Message:
Logged In: YES 
user_id=80530

This is much less important
than I believed when I posted it.
I was misinterpreting the
timing results of

time {testexprparser [string repeat +1 $N] -1}

which was getting really badly bogged
down for not-too-outrageously-big values
of $N, like 10000.

Turns out that [testexprparser] produces
a list as its output the string rep of
which has length O(N^2), so I was
measuring the test harness and not the
thing I wanted to test.

Happy ending is that testing with

time {expr [string repeat +1 $N]}

confirms that the revised parser
truly is O(N).  Hooray.

The memory bloat of Tcl_Tokens issue
is real, but only appears to matter
for getting-pretty-outrageously-huge
values of $N.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=360894&aid=1517602&group_id=10894

[Tcl-bugs] [ tcl-Feature Requests-1517602 ] long expressions and Tcl_Tokens

The Tool Command Language implementation

[Tcl-bugs] [ tcl-Feature Requests-1517602 ] long expressions and Tcl_Tokens