Menu

#3705 joinable threads die on Solaris 64-bit

obsolete: 8.5a7
closed-fixed
9
2007-05-29
2007-05-04
No

Platform: sparc solaris 9
Compiler: Forte Developer 7 C 5.4 2002/03/09
Tcl: cvs head located on tcl.activestate.com ftp site - titled tcl-20070503.tar.gz
Thread: cvs head located on tcl.activestate.com ftp site - titled thread-20070503.tar.gz
Configuration parms (for both tcl and thread): \"'--prefix=/projects/sprs_lwv/tcl85' '--enable-shared' '--enable-symbols' '--enable-stubs' '--enable-64bit' '--enable-threads'

After successfully running a fresh configure/build/test/install of the may 3 tcl, I proceeded to build a fresh copy of the may 3 thread. The configure and build worked fine, then I encountered:
Tests began at Fri May 04 08:52:33 EDT 2007
Thread 2.6.5
Mainthread id is tid1
thread.test
gmake: *** [test] Bus Error (core dumped)
$ dbx /vol/tclsrcsol/tcl85/tcl/unix/tclsh core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.0' in your .dbxrc
Reading tclsh
core file header read successfully
Reading ld.so.1
Reading libtcl8.5.so
Reading libdl.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading libpthread.so.1
Reading libm.so.1
Reading libc.so.1
Reading libmp.so.2
Reading libc_psr.so.1
Reading libthread.so.1
Reading libthread2.6.5.so
detected a multithreaded program
t@1 (l@1) terminated by signal BUS (invalid address alignment)
0xffffffff7dd0b7f4: _thrp_join+0x01ac: stx %l2, [%i2]
Current function is Tcl_JoinThread
166 result = pthread_join((pthread_t) threadId, (void**) state);
(dbx 1) print threadId
threadId = 0x7
(dbx 2) print state
state = 0xffffffff7fff19a4
(dbx 3) where
current thread: t@1
[1] _thrp_join(0x0, 0x0, 0xffffffff7fff19a4, 0x1, 0xffffffff7fff1808, 0xffffffff7fff1810), at 0xffffffff7dd0b7f4
=>[2] Tcl_JoinThread(threadId = 0x7, state = 0xffffffff7fff19a4), line 166 in "tclUnixThrd.c"
[3] ThreadJoin(interp = 0x1000428c0, thrId = 0x7), line 2062 in "threadCmd.c"
[4] ThreadJoinObjCmd(dummy = (nil), interp = 0x1000428c0, objc = 2, objv = 0xffffffff7fff1f48), line 1172 in "threadCmd.c"
[5] TclEvalObjvInternal(interp = 0x1000428c0, objc = 2, objv = 0xffffffff7fff1f48, command = 0x1003bc6bc "thread::join $tid]\n ThreadReap\n set c\n", length = 17, flags = 0), line 3617 in "tclBasic.c"
[6] TclEvalEx(interp = 0x1000428c0, script = 0x1003bc6bc "thread::join $tid]\n ThreadReap\n set c\n", numBytes = 17, flags = 0, line = 4), line 4222 in "tclBasic.c"
[7] TclSubstTokens(interp = 0x1000428c0, tokenPtr = 0xffffffff7fff2820, count = 1, tokensLeftPtr = (nil), line = 4), line 2099 in "tclParse.c"
[8] TclEvalEx(interp = 0x1000428c0, script = 0x1003bc670 "\n ThreadReap\n set tid [thread::create -joinable {set x 5}]\n set c [thread::join $tid]\n ThreadReap\n set c\n", numBytes = 120, flags = 262144, line = 4), line 4087 in "tclBasic.c"
[9] Tcl_EvalEx(interp = 0x1000428c0, script = 0x1003bc670 "\n ThreadReap\n set tid [thread::create -joinable {set x 5}]\n set c [thread::join $tid]\n ThreadReap\n set c\n", numBytes = 120, flags = 262144), line 3892 in "tclBasic.c"
[10] TclEvalObjEx(interp = 0x1000428c0, objPtr = 0x1002075b0, flags = 262144, invoker = (nil), word = 0), line 4592 in "tclBasic.c"
[11] Tcl_EvalObjEx(interp = 0x1000428c0, objPtr = 0x1002075b0, flags = 262144), line 4475 in "tclBasic.c"
[12] Tcl_UplevelObjCmd(dummy = (nil), interp = 0x1000428c0, objc = 1, objv = 0x100047008), line 900 in "tclProc.c"
[13] TclExecuteByteCode(interp = 0x1000428c0, codePtr = 0x100418760), line 1834 in "tclExecute.c"
[14] TclCompEvalObj(interp = 0x1000428c0, objPtr = 0x1000c8520, invoker = (nil), word = 0), line 996 in "tclExecute.c"
[15] TclObjInterpProcCore(interp = 0x1000428c0, framePtr = 0x100046e78, procNameObj = 0x1003d3030, isLambda = 0, skip = 1, errorProc = 0xffffffff7f18d0e0 = &`libtcl8.5.so`tclProc.c`MakeProcError(Tcl_Interp *interp, Tcl_Obj *procNameObj)), line 1535 in "tclProc.c"
[16] ObjInterpProcEx(clientData = 0x100095690, interp = 0x1000428c0, objc = 3, objv = 0x10040d420, isLambda = 0, errorProc = 0xffffffff7f18d0e0 = &`libtcl8.5.so`tclProc.c`MakeProcError(Tcl_Interp *interp, Tcl_Obj *procNameObj)), line 1283 in "tclProc.c"
[17] TclObjInterpProc(clientData = 0x100095690, interp = 0x1000428c0, objc = 3, objv = 0x10040d420), line 1213 in "tclProc.c"
[18] TclEvalObjvInternal(interp = 0x1000428c0, objc = 3, objv = 0x10040d420, command = 0xffffffff7f1df360 "", length = 0, flags = 262144), line 3617 in "tclBasic.c"
[19] Tcl_EvalObjv(interp = 0x1000428c0, objc = 3, objv = 0x10040d420, flags = 262144), line 3737 in "tclBasic.c"
[20] TclEvalObjEx(interp = 0x1000428c0, objPtr = 0x1003d1e90, flags = 262144, invoker = (nil), word = 0), line 4563 in "tclBasic.c"
[21] Tcl_EvalObjEx(interp = 0x1000428c0, objPtr = 0x1003d1e90, flags = 262144), line 4475 in "tclBasic.c"
[22] Tcl_UplevelObjCmd(dummy = (nil), interp = 0x1000428c0, objc = 1, objv = 0x100046e70), line 900 in "tclProc.c"
[23] TclExecuteByteCode(interp = 0x1000428c0, codePtr = 0x1002b9e00), line 1834 in "tclExecute.c"
[24] TclCompEvalObj(interp = 0x1000428c0, objPtr = 0x1000c83a0, invoker = (nil), word = 0), line 996 in "tclExecute.c"
[25] TclObjInterpProcCore(interp = 0x1000428c0, framePtr = 0x100046ca8, procNameObj = 0x1003d27c0, isLambda = 0, skip = 1, errorProc = 0xffffffff7f18d0e0 = &`libtcl8.5.so`tclProc.c`MakeProcError(Tcl_Interp *interp, Tcl_Obj *procNameObj)), line 1535 in "tclProc.c"
[26] ObjInterpProcEx(clientData = 0x10010d160, interp = 0x1000428c0, objc = 3, objv = 0x10040d860, isLambda = 0, errorProc = 0xffffffff7f18d0e0 = &`libtcl8.5.so`tclProc.c`MakeProcError(Tcl_Interp *interp, Tcl_Obj *procNameObj)), line 1283 in "tclProc.c"
[27] TclObjInterpProc(clientData = 0x10010d160, interp = 0x1000428c0, objc = 3, objv = 0x10040d860), line 1213 in "tclProc.c"
[28] TclEvalObjvInternal(interp = 0x1000428c0, objc = 3, objv = 0x10040d860, command = 0xffffffff7f1df360 "", length = 0, flags = 262144), line 3617 in "tclBasic.c"
[29] Tcl_EvalObjv(interp = 0x1000428c0, objc = 3, objv = 0x10040d860, flags = 262144), line 3737 in "tclBasic.c"
[30] TclEvalObjEx(interp = 0x1000428c0, objPtr = 0x1003d2d60, flags = 262144, invoker = (nil), word = 0), line 4563 in "tclBasic.c"
[31] Tcl_EvalObjEx(interp = 0x1000428c0, objPtr = 0x1003d2d60, flags = 262144), line 4475 in "tclBasic.c"
[32] Tcl_UplevelObjCmd(dummy = (nil), interp = 0x1000428c0, objc = 1, objv = 0x100046ca0), line 900 in "tclProc.c"
[33] TclExecuteByteCode(interp = 0x1000428c0, codePtr = 0x100407ea0), line 1834 in "tclExecute.c"
[34] TclCompEvalObj(interp = 0x1000428c0, objPtr = 0x1000ca950, invoker = (nil), word = 0), line 996 in "tclExecute.c"
[35] TclObjInterpProcCore(interp = 0x1000428c0, framePtr = 0x100045990, procNameObj = 0x1003cfdf0, isLambda = 0, skip = 1, errorProc = 0xffffffff7f18d0e0 = &`libtcl8.5.so`tclProc.c`MakeProcError(Tcl_Interp *interp, Tcl_Obj *procNameObj)), line 1535 in "tclProc.c"
[36] ObjInterpProcEx(clientData = 0x100094460, interp = 0x1000428c0, objc = 5, objv = 0xffffffff7fff8bc8, isLambda = 0, errorProc = 0xffffffff7f18d0e0 = &`libtcl8.5.so`tclProc.c`MakeProcError(Tcl_Interp *interp, Tcl_Obj *procNameObj)), line 1283 in "tclProc.c"
[37] TclObjInterpProc(clientData = 0x100094460, interp = 0x1000428c0, objc = 5, objv = 0xffffffff7fff8bc8), line 1213 in "tclProc.c"
[38] InvokeImportedCmd(clientData = 0x1001bb190, interp = 0x1000428c0, objc = 5, objv = 0xffffffff7fff8bc8), line 1904 in "tclNamesp.c"
[39] TclEvalObjvInternal(interp = 0x1000428c0, objc = 5, objv = 0xffffffff7fff8bc8, command = 0x1003e8af3 "test thread-4.4 {thread::create - create joinable thread} {\n ThreadReap\n set tid [thread::create -joinable {set x 5}]\n set c [thread::join $tid]\n ThreadReap\n set c\n} {0}\n\ntest thread-4.5 {thread::create - join detached thread} {\n ThreadReap\n set tid [thread::create]\n thread::send -async $tid {after 1000 ; thread::release}\n catch {set res [thread::join $tid]} msg\n ThreadReap\n lrange $msg 0 2\n} {cannot join thread}\n\ntest thread-5.0 {thread::release} {\n ThreadReap\n set ti" ..., length = 185, flags = 0), line 3617 in "tclBasic.c"
[40] TclEvalEx(interp = 0x1000428c0, script = 0x1003e7fb0 "# Commands covered: thread\n#\n# This file contains a collection of tests for one or more of the Tcl\n# built-in commands. Sourcing this file into Tcl runs the tests and\n# generates output for errors. No output means no errors were found.\n#\n# Copyright (c) 1996 Sun Microsystems, Inc.\n# Copyright (c) 1998-2000 Scriptics Corporation.\n# Copyright (c) 2002 ActiveState Corporation.\n#\n# See the file "license.terms" for information on usage and redistribution\n# of this file, and for a DISCLAIMER OF ALL WARRANTIES." ..., numBytes = 33216, flags = 0, line = 97), line 4222 in "tclBasic.c"
[41] Tcl_EvalEx(interp = 0x1000428c0, script = 0x1003e7fb0 "# Commands covered: thread\n#\n# This file contains a collection of tests for one or more of the Tcl\n# built-in commands. Sourcing this file into Tcl runs the tests and\n# generates output for errors. No output means no errors were found.\n#\n# Copyright (c) 1996 Sun Microsystems, Inc.\n# Copyright (c) 1998-2000 Scriptics Corporation.\n# Copyright (c) 2002 ActiveState Corporation.\n#\n# See the file "license.terms" for information on usage and redistribution\n# of this file, and for a DISCLAIMER OF ALL WARRANTIES." ..., numBytes = 33216, flags = 0), line 3892 in "tclBasic.c"
[42] Tcl_FSEvalFileEx(interp = 0x1000428c0, pathPtr = 0x1003cf190, encodingName = (nil)), line 1818 in "tclIOUtil.c"
[43] Tcl_SourceObjCmd(dummy = (nil), interp = 0x1000428c0, objc = 2, objv = 0x100045980), line 953 in "tclCmdMZ.c"
[44] TclExecuteByteCode(interp = 0x1000428c0, codePtr = 0x1003bbe70), line 1834 in "tclExecute.c"
[45] TclCompEvalObj(interp = 0x1000428c0, objPtr = 0x1000c9f30, invoker = 0xffffffff7fffbd60, word = 1), line 996 in "tclExecute.c"
[46] TclEvalObjEx(interp = 0x1000428c0, objPtr = 0x1000c9f30, flags = 0, invoker = 0xffffffff7fffbd60, word = 1), line 4677 in "tclBasic.c"
[47] Tcl_CatchObjCmd(dummy = (nil), interp = 0x1000428c0, objc = 3, objv = 0x100045968), line 253 in "tclCmdAH.c"
[48] TclExecuteByteCode(interp = 0x1000428c0, codePtr = 0x1003c1420), line 1834 in "tclExecute.c"
[49] TclCompEvalObj(interp = 0x1000428c0, objPtr = 0x1003cf7f0, invoker = 0xffffffff7fffc5a0, word = 3), line 996 in "tclExecute.c"
[50] TclEvalObjEx(interp = 0x1000428c0, objPtr = 0x1003cf7f0, flags = 0, invoker = 0xffffffff7fffc5a0, word = 3), line 4677 in "tclBasic.c"
[51] Tcl_ForeachObjCmd(dummy = (nil), interp = 0x1000428c0, objc = 4, objv = 0xffffffff7fffc6e8), line 1811 in "tclCmdAH.c"
[52] TclEvalObjvInternal(interp = 0x1000428c0, objc = 4, objv = 0xffffffff7fffc6e8, command = 0x100063553 "foreach file [lsort [::tcltest::getMatchingFiles]] {\n set tail [file tail $file]\n puts stdout $tail\n if {[catch {source $file} msg]} {\n puts stdout $msg\n }\n}\n\n# Cleanup\nputs stdout "\nTests ended at [eval $timeCmd]"\n::tcltest::cleanupTests 1\n\nreturn\n\n", length = 177, flags = 0), line 3617 in "tclBasic.c"
[53] TclEvalEx(interp = 0x1000428c0, script = 0x100062ed0 "# all.tcl --\n#\n# This file contains a top-level script to run all of the Tcl\n# tests. Execute it by invoking "source all.test" when running tcltest\n# in this directory.\n#\n# Copyright (c) 1998-1999 by Scriptics Corporation.\n# All rights reserved.\n# \n# RCS: @(#) $Id: all.tcl,v 1.5 2004/12/18 13:26:03 vasiljevic Exp $\n\npackage require tcltest\nnamespace import -force ::tcltest::*\n\nset ::tcltest::testSingleFile false\nset ::tcltest::testsDirectory [file dir [info script]]\n\n# We need to ensure that the testsDirec" ..., numBytes = 1937, flags = 0, line = 53), line 4222 in "tclBasic.c"
[54] Tcl_EvalEx(interp = 0x1000428c0, script = 0x100062ed0 "# all.tcl --\n#\n# This file contains a top-level script to run all of the Tcl\n# tests. Execute it by invoking "source all.test" when running tcltest\n# in this directory.\n#\n# Copyright (c) 1998-1999 by Scriptics Corporation.\n# All rights reserved.\n# \n# RCS: @(#) $Id: all.tcl,v 1.5 2004/12/18 13:26:03 vasiljevic Exp $\n\npackage require tcltest\nnamespace import -force ::tcltest::*\n\nset ::tcltest::testSingleFile false\nset ::tcltest::testsDirectory [file dir [info script]]\n\n# We need to ensure that the testsDirec" ..., numBytes = 1937, flags = 0), line 3892 in "tclBasic.c"
[55] Tcl_FSEvalFileEx(interp = 0x1000428c0, pathPtr = 0x100032f80, encodingName = (nil)), line 1818 in "tclIOUtil.c"
[56] Tcl_Main(argc = -1, argv = 0xffffffff7fffd1a8, appInitProc = 0x100001e58 = &Tcl_AppInit(Tcl_Interp *interp)), line 441 in "tclMain.c"
[57] main(argc = 2, argv = 0xffffffff7fffd198), line 87 in "tclAppInit.c"
(dbx 4)

Discussion

  • Larry W. Virden

    Larry W. Virden - 2007-05-14
    • priority: 5 --> 9
     
  • Larry W. Virden

    Larry W. Virden - 2007-05-14

    Logged In: YES
    user_id=15949
    Originator: YES

    This problem continues to exist, even after the recent thread extension code update.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-05-28
    • milestone: 652906 --> obsolete: 8.5a7
    • summary: thread.test bus error on sparc solaris 9 --> joinable threads die on Solaris 64-bit
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-05-28

    Logged In: YES
    user_id=72656
    Originator: NO

    I have narrowed this down to (at least first crash) being thread-4.4, and in particular the joining of a thread:

    set tid [thread::create -joinable {set x 5}]
    set c [thread::join $tid]

    This does not happen if you build the thread module with --enable-symbols for me, and it only happens on Solaris-x64 (not sparc-64).

     
  • Larry W. Virden

    Larry W. Virden - 2007-05-29

    Logged In: YES
    user_id=15949
    Originator: YES

    ote that I am using SPARC Solaris 46 bit, so the crash is happening there for me. Also, note that I am using --enable-symbols.

     
  • Larry W. Virden

    Larry W. Virden - 2007-05-29

    Logged In: YES
    user_id=15949
    Originator: YES

    ote that I am using SPARC Solaris 46 bit, so the crash is happening there for me. Also, note that I am using --enable-symbols.

     
  • Larry W. Virden

    Larry W. Virden - 2007-05-29

    Logged In: YES
    user_id=15949
    Originator: YES

    Sigh - sorry about that duplicate posting... sf.net was not telling me that the posting was accepted...

    Anyways, today I built the latest tcl and thread tar files from the ftp.activestate.com site. I continue to see the problem even with the latest thread change.

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086
    Originator: NO

    I believe the problem must be somewhere in the

    t@1 (l@1) terminated by signal BUS (invalid address alignment)
    0xffffffff7dd0b7f4: _thrp_join+0x01ac: stx %l2, [%i2]
    Current function is Tcl_JoinThread
    166 result = pthread_join((pthread_t) threadId, (void**) state);

    as the debugger says. because either "threadId" or "state" are
    different sizes then expected.
    I donot know if Tcl or threading extension is 64-bit aware
    (former probably yes, latter most definitely not), so all
    kind of weird things can happen here.

    If I could get access to such system, probability of finding a solution
    would be higher. Unfortunaltely, I do not have any such systtem at
    hand.

     
  • Larry W. Virden

    Larry W. Virden - 2007-05-29

    Logged In: YES
    user_id=15949
    Originator: YES

    If I had a personal system that I could share, I would. Alas, the security here is high enough that I would not be able to grant you access. Perhaps someone at ActiveState can help here.

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086
    Originator: NO

    Try rewriting the Tcl_JoinThread in Tcl to look like

    int
    Tcl_JoinThread(threadId, state)
    Tcl_ThreadId threadId; /* Id of the thread to wait upon */
    int* state; /* Reference to the storage the result
    * of the thread we wait upon will be
    * written into. */
    {
    #ifdef TCL_THREADS
    int result;
    unsigned long long dummy;

    result = pthread_join ((pthread_t) threadId, (VOID**) &dummy);
    *state = 0;
    return (result == 0) ? TCL_OK : TCL_ERROR;
    #else
    return TCL_ERROR;
    #endif

    so, effectively, ditch the error state of the thread, just pass
    the pthread_join a dummy state and see if this helps.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-05-29

    Logged In: YES
    user_id=72656
    Originator: NO

    If I change that section to actually use:

    #ifdef TCL_THREADS
    int result;
    unsigned long retcode;

    result = pthread_join((pthread_t) threadId, (void**) &retcode);
    *state = (int) retcode;
    return (result == 0) ? TCL_OK : TCL_ERROR;
    #else

    then the test cases work. However, then I get the following failure:

    ==== thread-17.9 thread::transfer - pipe - closable? FAILED
    ....
    ---- Test generated error; Return code was: 1
    ---- Return code should have been one of: 0 2
    ---- errorInfo: child killed: segmentation violation
    while executing
    "close $pipe"
    invoked from within
    "lappend res [close $pipe]"
    ("uplevel" body line 17)
    invoked from within
    "uplevel 1 $script"
    ---- errorCode: CHILDKILLED 14675 SIGSEGV {segmentation violation}
    ==== thread-17.9 FAILED

    and this I get repeatedly. Any thoughts??

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-05-29

    Logged In: YES
    user_id=72656
    Originator: NO

    Never mind - it turns out that the state* may be NULL, so it needs the check ...

    #ifdef TCL_THREADS
    int result;
    unsigned long retcode;

    result = pthread_join((pthread_t) threadId, (void**) &retcode);
    if (state) {
    *state = (int) retcode;
    }
    return (result == 0) ? TCL_OK : TCL_ERROR;
    #else

    and now things work. I am committing that for 8.4 and head.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2007-05-29
    • status: open --> closed-fixed