Menu

#1949 regression tests fail on Solaris 10/x86_64

obsolete: 8.4.11
closed-fixed
8
2005-08-11
2005-08-05
Jo
No

Hello,

I'm facing a serious problem on our SUNfire V20z
machine (Operon) running Solaris 10/x86:
After a successful compilation of Tcl/Tk 8.4.11 I run
the regression tests (make test). It fails on
choosedir.tcl with the following Tcl-stack trace:

can't modify -cursor option after widget is created
while executing
"$w config -cursor watch"
(procedure "::tk::dialog::file::Update" line 49)
invoked from within
"::tk::dialog::file::Update .__tk_choosedir"
("after" script)

Looking into the Tk sources I found serious 64-bit
portability bugs at several places. For example in
generic/tkFrame.c:787 [function FrameWidgetObjCmd()]
the wrong lines of code are:

size_t length;
char *arg = Tcl_GetStringFromObj(objv[i], (int *) &length);

From now on the content of length is completely
platform dependent (e.g. bigEndian vs. littleEndian).
As a result the following calls to strncmp(arg,
"-class", length) is random, because if length is e.g.
negative, it always gives me 0 (match!) even if arg is
"-cursor" ...

An easy fix for this could look like this:

size_t length;
int len;
char *arg = Tcl_GetStringFromObj(objv[i], &len);
length = len;

...or (better) change the signature of
Tcl_GetStringFromObj to accept a size_t* as a last
argument instead of a int*.

At least these files contain the same bug:
# egrep -l "Tcl_GetStringFromObj\(.*int"
tk8.4.11/generic/*.c
tkCanvLine.c
tkCanvPoly.c
tkCanvText.c
tkCanvas.c
tkConfig.c
tkFocus.c
tkFrame.c
tkGrid.c
tkImgPhoto.c
tkPack.c

BTW: the latest development branch 8.5.a3 still
contains the same bugs.

Thanks for looking into this,
Jochen

Discussion

  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Tcl_GetStringFromObj() really has to take a signed second
    arg, even though this sucks. I've got a long-standing thing
    assigned to me to fix up the mess and make it a long*, but
    there's no practical way to do it and maintain even a
    semblance of binary or source compatibility so I've been
    waiting for the next major release to clean out this
    particular swamp. (I had a go on a branch, but it's like
    pulling at a thread and seeing a whole tapestry unravel.)

    OTOH, you're quite right about it being bad to mix int* and
    size_t*

     
  • Donal K. Fellows

    • priority: 5 --> 8
    • assigned_to: hobbs --> dkf
     
  • Jo

    Jo - 2005-08-08

    Logged In: YES
    user_id=1314621

    Thanks Donal for your quick response!
    When I think about it again, the real problem is that the
    local size_t variable doesn't get fully initialized on an
    LP64 machine (only the first 4 bytes). Every memory debugger
    should prove that easily. So, a binary and source code
    compatible fix for this would be to initialize the size_t
    variable to 0:

    size_t length = 0;
    char *arg = Tcl_GetStringFromObj(objv[i], (int *) &length);

    Then it should work as expected (even with that nasty cast).
    Would it be possible if you applied those kind of fixes
    (I guess you'd have to check all calls to
    Tcl_GetStringFromObj...)?

     
  • Nobody/Anonymous

    Logged In: NO

    For fixing this, please don't change the signature of
    Tcl_GetStringFromObj because that would break the binary
    compatibility of the stub-interface. Please stay with "int*".
    I think, limiting the string length to 2GB is ok for the
    next 10 years ;-)

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    The nasty cast is an endian trap.

    The full fix, which will *not* happen in the 8.* series, is
    to stop using int to refer to string lengths; long is better
    but a different type may have to be used instead. It needs a
    small study of rather a lot of different architectures and
    compilers to resolve. :-( It's a bullet that will have to be
    bitten sometime though; there are people out there who work
    with really enormous strings and structures...

    Mean-time, rewriting to the code to get rid of the casts of
    the second argument to T_GSFO will do the right thing. I'm
    part way through doing this, but it's in a sandbox I can't
    reach from work.

     
  • Jo

    Jo - 2005-08-08

    Fix for partly (un)initialized variables on LP64 platforms

     
  • Jo

    Jo - 2005-08-08

    Logged In: YES
    user_id=1314621

    Donal, just in case you need this:

    I have created a small patch (minimum diffs) to resolve the
    issue by initializing the non-int local variables before
    creating a int* referrence on them. I looked thru all calls
    to T_GSFO in Tcl 8.4.11 and Tk8.4.11. Tcl seems to be OK (no
    fix needed).
    This is how I located the files that possibly needed to be
    fixed:

    find . \( -name \*.c -or -name \*.h \) -exec \ egrep 'Tcl_GetStringFromObj(.*(.*int.*\*.*).*)' '{}' \;

    The patch below was created with
    (I have duplicated the directory before applying my changes):

    diff -Nrc tk8.4.11 tk8.4.11fixed

    After the patch, the tk regression tests run fine. However,
    I'm not sure if there are other T_* functions that have a
    similar problem (not covered by regression tests)?!

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    You should know that I found at least one spot with the
    fault that your search wouldn't discover because the
    argument was spilled onto the next source line. :-/

    And the Tk experiments^W tests aren't complete. :-(

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Fixed in HEAD; 8.4 branch still undone

     
  • Donal K. Fellows

    • status: open --> closed-fixed
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Fixed in 8.4 branch

     
  • Jo

    Jo - 2005-11-10

    Logged In: YES
    user_id=1314621

    ------------------------------
    When will 8.4.12 be released?
    ------------------------------
    This is a *critical* issue for us, because we're not able to
    use Tcl 8.4.x in our production code for now.