Hello,
I'm facing a serious problem on our SUNfire V20z
machine (Operon) running Solaris 10/x86:
After a successful compilation of Tcl/Tk 8.4.11 I run
the regression tests (make test). It fails on
choosedir.tcl with the following Tcl-stack trace:
can't modify -cursor option after widget is created
while executing
"$w config -cursor watch"
(procedure "::tk::dialog::file::Update" line 49)
invoked from within
"::tk::dialog::file::Update .__tk_choosedir"
("after" script)
Looking into the Tk sources I found serious 64-bit
portability bugs at several places. For example in
generic/tkFrame.c:787 [function FrameWidgetObjCmd()]
the wrong lines of code are:
size_t length;
char *arg = Tcl_GetStringFromObj(objv[i], (int *) &length);
From now on the content of length is completely
platform dependent (e.g. bigEndian vs. littleEndian).
As a result the following calls to strncmp(arg,
"-class", length) is random, because if length is e.g.
negative, it always gives me 0 (match!) even if arg is
"-cursor" ...
An easy fix for this could look like this:
size_t length;
int len;
char *arg = Tcl_GetStringFromObj(objv[i], &len);
length = len;
...or (better) change the signature of
Tcl_GetStringFromObj to accept a size_t* as a last
argument instead of a int*.
At least these files contain the same bug:
# egrep -l "Tcl_GetStringFromObj\(.*int"
tk8.4.11/generic/*.c
tkCanvLine.c
tkCanvPoly.c
tkCanvText.c
tkCanvas.c
tkConfig.c
tkFocus.c
tkFrame.c
tkGrid.c
tkImgPhoto.c
tkPack.c
BTW: the latest development branch 8.5.a3 still
contains the same bugs.
Thanks for looking into this,
Jochen
Logged In: YES
user_id=79902
Tcl_GetStringFromObj() really has to take a signed second
arg, even though this sucks. I've got a long-standing thing
assigned to me to fix up the mess and make it a long*, but
there's no practical way to do it and maintain even a
semblance of binary or source compatibility so I've been
waiting for the next major release to clean out this
particular swamp. (I had a go on a branch, but it's like
pulling at a thread and seeing a whole tapestry unravel.)
OTOH, you're quite right about it being bad to mix int* and
size_t*
Logged In: YES
user_id=1314621
Thanks Donal for your quick response!
When I think about it again, the real problem is that the
local size_t variable doesn't get fully initialized on an
LP64 machine (only the first 4 bytes). Every memory debugger
should prove that easily. So, a binary and source code
compatible fix for this would be to initialize the size_t
variable to 0:
size_t length = 0;
char *arg = Tcl_GetStringFromObj(objv[i], (int *) &length);
Then it should work as expected (even with that nasty cast).
Would it be possible if you applied those kind of fixes
(I guess you'd have to check all calls to
Tcl_GetStringFromObj...)?
Logged In: NO
For fixing this, please don't change the signature of
Tcl_GetStringFromObj because that would break the binary
compatibility of the stub-interface. Please stay with "int*".
I think, limiting the string length to 2GB is ok for the
next 10 years ;-)
Logged In: YES
user_id=79902
The nasty cast is an endian trap.
The full fix, which will *not* happen in the 8.* series, is
to stop using int to refer to string lengths; long is better
but a different type may have to be used instead. It needs a
small study of rather a lot of different architectures and
compilers to resolve. :-( It's a bullet that will have to be
bitten sometime though; there are people out there who work
with really enormous strings and structures...
Mean-time, rewriting to the code to get rid of the casts of
the second argument to T_GSFO will do the right thing. I'm
part way through doing this, but it's in a sandbox I can't
reach from work.
Fix for partly (un)initialized variables on LP64 platforms
Logged In: YES
user_id=1314621
Donal, just in case you need this:
I have created a small patch (minimum diffs) to resolve the
issue by initializing the non-int local variables before
creating a int* referrence on them. I looked thru all calls
to T_GSFO in Tcl 8.4.11 and Tk8.4.11. Tcl seems to be OK (no
fix needed).
This is how I located the files that possibly needed to be
fixed:
find . \( -name \*.c -or -name \*.h \) -exec \ egrep 'Tcl_GetStringFromObj(.*(.*int.*\*.*).*)' '{}' \;
The patch below was created with
(I have duplicated the directory before applying my changes):
diff -Nrc tk8.4.11 tk8.4.11fixed
After the patch, the tk regression tests run fine. However,
I'm not sure if there are other T_* functions that have a
similar problem (not covered by regression tests)?!
Logged In: YES
user_id=79902
You should know that I found at least one spot with the
fault that your search wouldn't discover because the
argument was spilled onto the next source line. :-/
And the Tk experiments^W tests aren't complete. :-(
Logged In: YES
user_id=79902
Fixed in HEAD; 8.4 branch still undone
Logged In: YES
user_id=79902
Fixed in 8.4 branch
Logged In: YES
user_id=1314621
------------------------------
When will 8.4.12 be released?
------------------------------
This is a *critical* issue for us, because we're not able to
use Tcl 8.4.x in our production code for now.