|
From: EricT <tw...@gm...> - 2025-11-03 06:44:19
|
Hi Tcl Core team,
Following up on my earlier email about the array subscript limitation:
I've found and fixed the compiler issue. The prototype is now
functional again for evaluation purposes, though the line number
tracking issue I mentioned earlier remains.
The Fix (this is not posted anywhere else - just an old/new below):
The problem was in TclPushVarName (tclCompile.c), which uses pointer
arithmetic between adjacent tokens to calculate array index lengths.
For x($((expr))), the tokens are:
- Token[1]: TEXT "x(" in original source
- Token[2]: COMMAND "[expr {...}]" in synthetic buffer
- Token[3]: TEXT ")" in original source
When the code calculated varTokenPtr[2].start - p, it was subtracting
pointers from different memory allocations, producing garbage values.
The fix avoids cross-token pointer arithmetic by staying within single
tokens and summing token sizes instead:
Here's the context, followed by the OLD to be replaced with the NEW
/*
* Check the last token: if it is just ')', do not count it.
* Otherwise, remove the ')' and flag so that it is restored at
* the end.
*/
if (varTokenPtr[n].size == 1) {
n--;
} else {
varTokenPtr[n].size--;
removedParen = n;
}
name = varTokenPtr[1].start;
nameLen = p - varTokenPtr[1].start;
elName = p + 1;
// OLD (broken):
remainingLen = (varTokenPtr[2].start - p) - 1;
elNameLen = (varTokenPtr[n].start - p) + varTokenPtr[n].size - 1;
// NEW (fixed):
remainingLen = (varTokenPtr[1].start + varTokenPtr[1].size) - (p + 1);
elNameLen = remainingLen;
for (Tcl_Size i = 2; i <= n; i++) {
elNameLen += varTokenPtr[i].size;
}
What these calculate:
When parsing x(abc$var), the code needs to extract the array index portion:
- remainingLen: Length of literal text after ( in the first token
("abc" = 3 bytes). Used to set the size of a newly created TEXT token
for that fragment inserted before any following tokens. The fix now
correctly computes this by staying within varTokenPtr[1]'s buffer.
- elNameLen: Total array index length. Currently only used to check if
the index is empty (zero vs non-zero):
if (elNameLen) {
TclCompileTokens(interp, elemTokenPtr, elemTokenCount, envPtr);
} else {
PUSH(""); // Empty array index
}
Note: The fix sums token sizes, which for expression substitution
includes the synthetic buffer size (12 bytes for "[expr {1+2}]")
rather than the original source size (8 bytes for "$((1+2))"). This
doesn't affect the current zero/non-zero check, but future maintainers
would have to be aware that elNameLen represents token sizes, not
original source length.
However, as mentioned, for production code, it would be better to
engineer this into the parser with a new token type which specifically
is intended for synthetic strings. This could also be used for other
purposes, so it might not be just a single use token. And it should
also be tracked for later deletion when it is no longer in use,
possibly if there's a new compiler epoch.
Status:
Array subscripts will now work correctly with the synthetic string
approach. The prototype can be used for community evaluation of
whether $((...)) syntax is useful enough to include in Tcl 9.1.
The line number tracking issue remains, but that's a separate problem
affecting error reporting rather than functionality.
Best regards,
Eric
|