#2007 Channel Transfer crashes

obsolete: 8.4b1
closed-fixed
9
2003-07-18
2002-08-11
Anonymous
No

Hello,

I am writing a concurrent Tcl server using threads. I have
thread enabled Tcl core and thread extension 2.4. I am
doing:

1. Open a socket and create a thread .

2. transfer socket to the created thread.

3. Send "puts sockid abc; flush sockid" script to the
created thread. It crashes.

I do not know why? Could you please look into the
problem and write me back.

Here is my code

% socket localhost 35000

sock460

% pwd

% load thread.dll

% ::thread::create

1304

% ::thread::transfer 1304 sock460

% ::thread::send 1304 "puts sock460 line; flush
sock460"

Thanks

yasar

Discussion

  • Andreas Kupries

    Andreas Kupries - 2002-08-14
    • priority: 5 --> 8
     
  • Don Porter

    Don Porter - 2002-08-19

    Logged In: YES
    user_id=80530

    Just tried to reproduce this on Linux/Alpha
    using Tcl 8.4b2 and Thread 2.4.

    Right away I see there's a difficulty.
    What service do you have running
    on port 35000?

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Logged In: YES
    user_id=75003

    Tried to replicate on Linux/x86. Used the smtp port instead
    of 35000 to ensure that the socket truly exists. May script
    is:
    set s [socket localhost smtp] ; # connect to smtp mail
    package require Thread

    puts [info loaded]
    puts [pwd]

    set t [::thread::create]

    ::thread::transfer $t $s
    ::thread::send $t "puts $s line; flush $s"

    I.e. this is not interactive, but executed via

    tclsh ./testscript

    No problem. Then I added the following lines to the script,
    at the end:

    after 5000 "::thread::send $t exit"
    vwait forever

    Now every once in a while the script does abort, the error
    message is:

    Tcl_Release couldn't find reference for 0x80533f8
    Aborted (core dumped)

    In other words, a panice somewhere. The stack-trace, see
    below indicates the handling of the timer event:

    #0 0x4014b7b1 in kill () from /lib/libc.so.6
    #1 0x400f5e5e in pthread_kill () from /lib/libpthread.so.0
    #2 0x400f6339 in raise () from /lib/libpthread.so.0
    #3 0x4014cc11 in abort () from /lib/libc.so.6
    #4 0x4009f92e in Tcl_PanicVA (format=0x400d2d80
    "Tcl_Release couldn't find reference for 0x%x",
    argList=0xbffff3b8)
    at ../../tcl/unix/../generic/tclPanic.c:106
    #5 0x4009f967 in Tcl_Panic (arg1=0x400d2d80 "Tcl_Release
    couldn't find reference for 0x%x") at
    ../../tcl/unix/../generic/tclPanic.c:134
    #6 0x400a897b in Tcl_Release (clientData=0x80533f8) at
    ../../tcl/unix/../generic/tclPreserve.c:255
    #7 0x400b2ff5 in AfterProc (clientData=0x80a95e0) at
    ../../tcl/unix/../generic/tclTimer.c:1054
    #8 0x400b2473 in TimerHandlerEventProc (evPtr=0x80a9660,
    flags=-3) at ../../tcl/unix/../generic/tclTimer.c:543
    #9 0x4009ce21 in Tcl_ServiceEvent (flags=-3) at
    ../../tcl/unix/../generic/tclNotify.c:618
    #10 0x4009d2a1 in Tcl_DoOneEvent (flags=-3) at
    ../../tcl/unix/../generic/tclNotify.c:921
    #11 0x4006a5ec in Tcl_VwaitObjCmd (clientData=0x0,
    interp=0x80533f8, objc=2, objv=0xbffff5b8) at
    ../../tcl/unix/../generic/tclEvent.c:990
    #12 0x4003b90c in TclEvalObjvInternal (interp=0x80533f8,
    objc=2, objv=0xbffff5b8, command=0x8052aea "\nvwait
    forever\n", length=15, flags=0)
    at ../../tcl/unix/../generic/tclBasic.c:3033
    #13 0x4003c5c0 in Tcl_EvalEx (interp=0x80533f8,
    script=0x80529f8 "\n\nset s [socket localhost smtp] ; #
    connect to smtp mail\npackage require Thread\n\nputs [info
    loaded]\nputs [pwd]\n\nset t
    [::thread::create]\n\n::thread::transfer $t
    $s\n::thread::send $t \"puts $s line; flus"...,
    numBytes=257, flags=0)
    at ../../tcl/unix/../generic/tclBasic.c:3631
    #14 0x40090044 in Tcl_FSEvalFile (interp=0x80533f8,
    pathPtr=0x80589e8) at
    ../../tcl/unix/../generic/tclIOUtil.c:1371
    #15 0x40097e83 in Tcl_Main (argc=1, argv=0xbffffaf8,
    appInitProc=0x80486d8 <Tcl_AppInit>) at
    ../../tcl/unix/../generic/tclMain.c:292
    #16 0x080486cc in main (argc=2, argv=0xbffffaf4) at
    ../../tcl/unix/../unix/tclAppInit.c:90
    #17 0x4013b17f in __libc_start_main () from /lib/libc.so.6

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Logged In: YES
    user_id=75003

    The resource which is not found anymore is an interpreter.
    Possibly the interpreter performing the after script. ...
    Just checked, this happens without sockets and transfering
    them. In other words, this is a different problem than shown
    in this report. Creating a new SF entry: #597575.

    I will have to check this on a windows platform.

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Logged In: YES
    user_id=75003

    Confirmed for Win'2K. Stack trace:

    TcpOutputProc(void * 0x007d5fc0, const char * 0x008db0d0,
    int 6, int * 0x0150f994) line 1803 + 14 bytes
    FlushChannel(Tcl_Interp * 0x00000000, Channel *
    0x007d5f70, int 0) line 2066 + 38 bytes
    Tcl_Flush(Tcl_Channel_ * 0x007d5f70) line 5104 + 13 bytes
    Tcl_FlushObjCmd(void * 0x00000000, Tcl_Interp *
    0x007d5490, int 2, Tcl_Obj * const * 0x0150fbec) line 194 + 9
    bytes
    TclEvalObjvInternal(Tcl_Interp * 0x007d5490, int 2, Tcl_Obj *
    const * 0x0150fbec, const char * 0x007d87f3, int 14, int 0)
    line 3033 + 25 bytes
    Tcl_EvalEx(Tcl_Interp * 0x007d5490, const char *
    0x007d87e0, int 33, int 131072) line 3632 + 42 bytes
    ThreadSendEval(Tcl_Interp * 0x007d5490, void * 0x007d9610)
    line 1250 + 27 bytes
    ThreadEventProc(Tcl_Event * 0x007d89a0, int -3) line 2386 +
    13 bytes
    Tcl_ServiceEvent(int -3) line 618 + 11 bytes
    Tcl_DoOneEvent(int -3) line 921 + 9 bytes
    ThreadWait() line 2189 + 14 bytes
    ThreadWaitObjCmd(void * 0x00000000, Tcl_Interp *
    0x007d5490, int 1, Tcl_Obj * const * 0x0150ff08) line 955
    TclEvalObjvInternal(Tcl_Interp * 0x007d5490, int 1, Tcl_Obj *
    const * 0x0150ff08, const char * 0x007d9b90, int 12, int 0)
    line 3033 + 25 bytes
    Tcl_EvalEx(Tcl_Interp * 0x007d5490, const char *
    0x007d9b90, int 12, int 0) line 3632 + 42 bytes
    Tcl_Eval(Tcl_Interp * 0x007d5490, const char * 0x007d9b90)
    line 3796 + 17 bytes
    NewThread(void * 0x0012f640) line 1472 + 23 bytes
    KERNEL32! 77e8758a()

    Dereferencing a NULL pointer in TcpOutputProc.
    tsdPtr is NULL. tsd = Thread-specific Data. That is something
    which should never be NULL.

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20
    • priority: 8 --> 9
     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Logged In: YES
    user_id=75003

    Attaching the scripts I used for testing.

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Crashing script

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Non-crashing script

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Logged In: YES
    user_id=75003

    New datapoint:
    Start thread-enabled tclsh (in MSVC++ debugger), set a
    breakpoint in file tclWinSock, line 1776. This is where the
    core retrieves the tsdPtr for the SendMessage stuff later.

    Source non-crashing script, step into the tsd retrieval. The
    ultimate routine is TlsGetValue, presumably provided by
    Windows. I can't step into it. Here things are ok.

    Now source the crashing script, do not change interpreters.
    Step into the retrieval again. All arguments etc. are the same
    as before, but now TlsGetValue returns NULL.. The reason is
    unknown. It is suspected that somewhere some memory
    went haywire. Couldn't prove this however. 'memory validate
    on' does not trigger anything before we hit the crash (Yes, I
    used tcl/threads compiled with TCL_MEM_DEBUG).

    I declare this a windows specific bug for now, because of
    TlsGetValue, and the info by Don Porter that the attached
    crashing script is running ok on his Linux/Alpha.

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20

    Logged In: YES
    user_id=75003

    Found the problem. When a socket is created in a thread the
    socket driver will be initialized for that thread, especially its
    TSD slot.

    Call sequence:
    SocketObjCmd => TclpHasSockets => InitSocket

    Now if a thread is created and no socket is created nothing is
    iniitialized. The channel transfer then inserts a socket into the
    thread, but this does not run any code to completely initialize
    the driver. Hence the TSD slot is uninitialized and thus the
    crash.

    Because of the workaround described above (create and
    destroy a temp socket in the thread before transfering
    sockets) the priority will go down.

    The true fix however is to extend the channel driver with an
    init-function which can be used by channels during
    registration in an interp to ensure that their driver is initialized
    in the thread of said interp.

     
  • Andreas Kupries

    Andreas Kupries - 2002-08-20
    • priority: 9 --> 7
     
  • Wojciech Kocjan

    Wojciech Kocjan - 2002-11-08

    Logged In: YES
    user_id=191529

    Workaround to the problem Andreas describes is unfortunately
    incomplete. Below is a testing script that SHOULD work. It
    stops when trying to do a gets on the channel, while when I
    use my *sockPtr=0 hack, everything seems to work OK.

    package require Thread
    set id [thread::create]
    thread::send $id {
    close [socket -server puts 0]
    }

    proc d {sock args} {
    after idle [list d0 $sock]
    }
    proc d0 {sock} {
    global id
    thread::send $id [list set sock $sock]
    thread::send $id [list set tid [thread::id]]

    thread::transfer $id $sock

    thread::send -async $id {
    puts $sock "HI"
    flush $sock
    thread::send -async $tid [list puts SENTHI]
    puts $sock [gets $sock]
    thread::send -async $tid [list puts SENTLINE]
    flush $sock
    close $sock
    thread::send -async $tid [list puts DONE]
    }
    }

    socket -server d 12345

    set next [thread::create]
    thread::send $next [list set tid [thread::id]]
    thread::send -async $next {
    package require Thread
    if {[catch {
    after 2000
    set s [socket 127.0.0.1 12345]
    puts $s TEST; flush $s
    } err]} {
    thread::send -async $::tid [list puts "ERROR:
    $::errorInfo"]
    }
    }

     
  • David Gravereaux

    Logged In: YES
    user_id=7549

    *** generic/tclIO.c 30 Jul 2002 18:36:25 -0000 1.57
    --- generic/tclIO.c 10 Nov 2002 10:30:52 -0000
    ***************
    *** 771,776 ****
    --- 771,785 ----
    panic("Tcl_RegisterChannel: duplicate channel
    names");
    }
    Tcl_SetHashValue(hPtr, (ClientData) chanPtr);
    + #ifdef __WIN32__
    + if (! strcmp(chanPtr->typePtr->typeName, "tcp")) {
    + /*
    + * Just in case, force per-thread initialization to
    happen
    + * so the socket event handler thread gets
    created.
    + */
    + TclpHasSockets(NULL);
    + }
    + #endif
    }
    statePtr->refCount++;
    }

    That seems to do it, but is rather "bad style".

     
  • David Gravereaux

    Dave's idea

     
  • David Gravereaux

    Logged In: YES
    user_id=7549

    Andreas:
    >The true fix however is to extend the channel driver with an
    >init-function which can be used by channels during
    >registration in an interp to ensure that their driver is initialized
    >in the thread of said interp.

    Yes, exactly. Does this issue exist in the other channel
    types, too? Should we generalize another another entry in
    the Tcl_ChannelType struct just for this purpose?

    See uploaded patch file for my idea in code.

     
  • Andreas Kupries

    Andreas Kupries - 2003-04-22

    Logged In: YES
    user_id=75003

    See also [ 718045 ] Closing transferred channel crashes app.

     
  • Andreas Kupries

    Andreas Kupries - 2003-04-22
    • priority: 7 --> 9
     
  • Andreas Kupries

    Andreas Kupries - 2003-04-22
    • assigned_to: andreas_kupries --> das
    • status: open --> open-fixed
     
  • Andreas Kupries

    Andreas Kupries - 2003-04-22

    Logged In: YES
    user_id=75003

    Reassigning to Daniel for test of Mac changes.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2003-07-18

    Logged In: YES
    user_id=72656

    moved to pending since we haven't heard - assuming
    functional on Mac.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2003-07-18
    • status: open-fixed --> pending-fixed
     
  • Andreas Kupries

    Andreas Kupries - 2003-07-18

    Logged In: YES
    user_id=75003

    Daniel, did you test the changes ?

     
  • Andreas Kupries

    Andreas Kupries - 2003-07-18
    • status: pending-fixed --> open-fixed
     
  • Andreas Kupries

    Andreas Kupries - 2003-07-18
    • status: open-fixed --> closed-fixed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks