From: SourceForge.net <no...@so...> - 2006-12-12 21:45:39
|
Feature Requests item #1517602, was opened at 2006-07-05 11:52 Message generated for change (Comment added) made by dgp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=360894&aid=1517602&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 44. Parsing and Eval Group: None >Status: Closed >Resolution: Accepted Priority: 2 Private: No Submitted By: Don Porter (dgp) Assigned to: Don Porter (dgp) Summary: long expressions and Tcl_Tokens Initial Comment: The array of Tcl_Tokens that Tcl's parser generates has several design flaws when used to parse long expressions. consider the simple expression "1+1". It parses into the Tcl_Token stream: subexpr 1+1 5 operator + 0 subexpr 1 1 text 1 0 subexpr 1 1 text 1 0 That's six Tcl_Tokens to represent a 3 character expression. At 16 bytes each (more on 64-bit system), that's a considerable bloat of memory. Apply that to expr [string repeat +1 10000] and you need 640Kbytes just to hold the final Tcl_Token array for a 10Kbyte expression. Worse, you need that 640KBytes as one contiguous block of memory, so even if you might have the RAM available, you can panic if its not available in a large enough block. A fairly straightforward simplification to an RPN style token stream would give: operator + 2 text 1 0 text 1 0 The expr parser already adds TCL_TOKEN_WORD tokens to flag multi-token operands. There's no apparent need for the TCL_TOKEN_SUB_EXPR to mark the start of operands. The other value TCL_TOKEN_SUB_EXPR offers is to keep the substring "1+1" distinct from the substring "+" in the TCL_TOKEN_OPERATOR token or the substring "{foo}" distinct from the substring "foo". The main utility of that, though, is to have the string rep of the whole subexpression handy for error messages or debug trace messages. Makes more sense to me to have such messages take longer to generate, with the tradeoff that larger legal expressions are usable. One other issue where Tcl_Tokens don't serve expressions well. In order to parse the boundaries of literal string representations of numbers, TclParseNumber() determines the numeric value. This is simply discarded because Tcl_Tokens only store strings. Later, when the expression is actually evaluated, the string form of the number has to be parsed all over again to get the numeric value. A revision that permitted passing through a Tcl_Obj would be nice. An additional Tcl_Token type might permit that. The problem with most of these ideas is that the current design for the Tcl_Token array representation of expressions is documented as part of the Tcl_ParseExpr() routine. If we make any of these changes, we'll at least have to build a converter to the old layout into T_PE. ---------------------------------------------------------------------- >Comment By: Don Porter (dgp) Date: 2006-12-12 16:45 Message: Logged In: YES user_id=80530 Originator: YES strategies now in place on the HEAD. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-08-31 23:23 Message: Logged In: YES user_id=80530 Oh, another bit of data lost in the reduction of a parsed Tcl expression to a sequence of Tcl_Tokens... The expr parser determines the difference between unary and binary + and - operators. However, the Tcl_Token carries only the string rep of the operator, so the expr compiler is forced to decide all over again which + and - operators are binary and which are unary in order to compile to the right bytecodes. A revised storage strategy could avoid this loss and recomputation of information. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-08-31 23:20 Message: Logged In: YES user_id=80530 This is much less important than I believed when I posted it. I was misinterpreting the timing results of time {testexprparser [string repeat +1 $N] -1} which was getting really badly bogged down for not-too-outrageously-big values of $N, like 10000. Turns out that [testexprparser] produces a list as its output the string rep of which has length O(N^2), so I was measuring the test harness and not the thing I wanted to test. Happy ending is that testing with time {expr [string repeat +1 $N]} confirms that the revised parser truly is O(N). Hooray. The memory bloat of Tcl_Tokens issue is real, but only appears to matter for getting-pretty-outrageously-huge values of $N. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=360894&aid=1517602&group_id=10894 |