From: SourceForge.net <no...@so...> - 2009-05-06 20:27:35
|
Bugs item #486399, was opened at 2001-11-28 10:04 Message generated for change (Comment added) made by ferrieux You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=486399&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 80. Thread Package Group: obsolete: 8.4a3 Status: Open Resolution: None Priority: 9 Private: No Submitted By: Chris Hall (chrishall) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Panic on exit before all threads returned Initial Comment: Solaris 2.7, 2 CPU machine [belinda 14]--> gcc -v Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.7/2.95.3/specs gcc version 2.95.3 20010315 (release) Tcl 8.4a3, built thus: ./configure\ --enable-threads\ --enable-shared\ --enable-gcc\ --prefix=/opt/tcl8.4a3 Thread package built thus: ./configure\ --enable-gcc\ --with-tcl=/opt/tcl8.4a3/lib\ --enable-shared\ --enable-threads\ --prefix=/opt/tcl8.4a3 The attached tar file contains three small scripts. If you run: tclsh8.4 boss.tcl You should be able to get it to do this: [belinda 66]--> tclsh8.4 boss.tcl task_done 6 task_done 7 task_done 8 task_done 9 task_done 10 task_done 6 task_done 7 task_done 8 task_done 9 task_done 6 10 exiting... BOSS: worker_exit 10 8 exiting... 7 exiting... BOSS: worker_exit 8 BOSS: worker_exit 7 9 exiting... BOSS: worker_exit 9 6 exiting... BOSS: worker_exit 6 BOSS: done... Tcl_Release couldn't find reference for 0x43938 Abort (core dumped) My stack trace is: (gdb) where #0 0xff059968 in __sigprocmask () from /usr/lib/libthread.so.1 #1 0xff04f1f0 in _resetsig () from /usr/lib/libthread.so.1 #2 0xff04e93c in _sigon () from /usr/lib/libthread.so.1 #3 0xff0517b4 in _thrp_kill () from /usr/lib/libthread.so.1 #4 0xff0b9450 in abort () from /usr/lib/libc.so.1 #5 0xff2ee028 in Tcl_PanicVA () from /opt/tcl8.4a3/lib/libtcl8.4.so #6 0xff2ee054 in Tcl_Panic () from /opt/tcl8.4a3/lib/libtcl8.4.so #7 0xff2f57f0 in Tcl_Release () from /opt/tcl8.4a3/lib/libtcl8.4.so #8 0xfe7c347c in ThreadEventProc () from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so #9 0xff2ec1e0 in Tcl_ServiceEvent () from /opt/tcl8.4a3/lib/libtcl8.4.so #10 0xff2ec538 in Tcl_DoOneEvent () from /opt/tcl8.4a3/lib/libtcl8.4.so #11 0xfe7c3148 in ThreadWait () from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so #12 0xfe7c1b28 in ThreadWaitObjCmd () from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so #13 0xff2eebec in EvalObjv () from /opt/tcl8.4a3/lib/libtcl8.4.so #14 0xff2ef230 in Tcl_EvalEx () from /opt/tcl8.4a3/lib/libtcl8.4.so #15 0xff2e33dc in Tcl_FSEvalFile () from /opt/tcl8.4a3/lib/libtcl8.4.so #16 0xff2b51ac in Tcl_SourceObjCmd () from /opt/tcl8.4a3/lib/libtcl8.4.so #17 0xff2eebec in EvalObjv () from /opt/tcl8.4a3/lib/libtcl8.4.so #18 0xff2ef230 in Tcl_EvalEx () from /opt/tcl8.4a3/lib/libtcl8.4.so #19 0xff2ef48c in Tcl_Eval () from /opt/tcl8.4a3/lib/libtcl8.4.so #20 0xfe7c2238 in NewThread () from /opt/tcl8.4a3/lib/thread2.2/../libthread2.2.so ---------------------------------------------------------------------- >Comment By: Alexandre Ferrieux (ferrieux) Date: 2009-05-06 22:27 Message: Here's a copy of a message I sent to tclcore regarding these issues: from Alexandre Ferrieux <ale...@gm...> to TclCore <tcl...@li...> date Wed, May 6, 2009 at 12:42 AM subject Finalization vs. Exit mailed-by gmail.com hide details 12:42 AM (21 hours ago) Reply Hi, As exhibited e.g. in https://sourceforge.net/tracker/?func=detail&aid=486399&group_id=10894&atid=110894 , the Tcl finalization sequence is a complex thing. In this specific bug, the problem involves avoiding bad interaction of second-class threads with the main (first-class) one during the very last steps of teardown, when the ground starts to dissolve under everybody's feet. In other instances, it can be argued that we're doing way too much "administrative" cleanup (like freeing memory) just before exiting. Now a simple approach seems to be viable: make a clear distinction between the [exit] path and full finalization (eg in embedded scenarios, since Donal dislikes scenarii ;-). The idea, then, is to make clearer in the code the dichotomy between the exit handlers that are "administrative" (meaning they have no effect outside the dying process) and can thus be skipped in the [exit] case, and those that are really compulsory in all cases because they have side-effects on the outside world, these side-effects being part of a documented or implicit contract that we simply cannot break. Then, for the [exit] path, just do the compulsory ones, and call the OS's exit() function. Question: could you help me draw this dichotomy ? Here is what I have spotted so far: - process-wide exit handlers registered with Tcl_CreateExitHandler, aka "early exit handlers" --> compulsory - per-thread exit handlers (freeing memory) --> skippable for [exit] Still in the gray zone, is FinalizeIOSubsystem. I know of cases where not calling it might have long-ranging effects (like RST sent on all non-closed sockets), but since it deals with per-thread/interp structures (like channel lists), it should either be entirely skipped or done for all threads... which is problematic at exit time if some threads are blocked or in an uncontrolled state. Thanks in advance for any insight on this, -Alex ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2009-05-04 19:33 Message: Upping the prio because it aborts on very simple scripts even on Windows (where threading is not so alien ;-). Will look a this shortly in the light of recent work on exit handlers. The current abort message on Win32/mingw is: exit handlers were created during Tcl_Finalizecalled Tcl_FindHashEntry on deleted table This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2006-03-15 11:06 Message: Logged In: YES user_id=148712 Bug #597997 (closed as a dup of this one) has a small test script to trigger the panic: package require Thread set i 0 while {[incr i] <= 2} { thread::create "puts $i" } Avoid the panic by adding the following at the end: # make sure to wait until all threads have returned while {[llength [thread::names]] > 1} { after 100 set x 1 vwait x } ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-03-10 14:02 Message: Logged In: YES user_id=95086 Yes. This is still valid. The problem is that by the concept of the Tcl lib, there is a implicit distinction between the main thread (the startup thread) doing the Tcl_Exit and other threads doing Tcl_ThreadExit. Actaully the entire cleanup/teardown is not thread-friendly and relies on the fact that startup thread must exit last, which is not always true. Rather, the thread which exits AND it is the LAST thread in the process must initiate teardown. This requires quite a lot of plumbing here and there and it is questionable if it is worth the effort in the 8.x branch. I could imagine closing this bug, yet opening another RFE to make the Tcl finalization more thread-compatible. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-03-10 06:31 Message: Logged In: YES user_id=80530 Is this still valid? Much revision of finalization has been done since this was reported. ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2002-08-21 23:24 Message: Logged In: YES user_id=75003 See also * [ 597997 ] async+thread panic * [ 597575 ] [exit] in sub-thread may crash. ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2002-01-11 22:33 Message: Logged In: YES user_id=75003 I am not satisified with the proposed solution of using TclInExit (as public API) to avoid the problem. Threads are usual unwound (thread::unwind) from nested event loops and can process arbitrary commands before actually exiting. This means that this code may use any global subsystem of the tcl-library, including the Preserve-subsystem, and not only by calling Tcl_Release, but Tcl_Preserve as well. As all these global susbsystems are effectively already shutdown anything may happen, most likely crashes in other places. And TclInExit will tide us over these only so far. The main problem I see exposed through this bug is IMHO that the main thread is special. Shutting down a thread created by the Thread package finalizes this thread and all data pertaining to it. But shutting down the main thread additionally finalizes all global information shared across all the existing threads, thus rendering all other threads unable to run. So, what possible solutions do we have to this ? 1) Have the thread package register a global exit handler. As these are called before everything else this will allow the thread package to simply kill all threads still running (This would also be the approved and public method of getting the information that we are in exiting the process). The downside would be that the killed threads may not have completed their actions. In the specific case of this bug it would be most likely that the logger thread is killed and unable to print the log messages of the last- closing worker-thread. But this is something which can be worked around. 2) Find a way to block the finalization of the main thread and global information until after all the other threads are finalized. The downside is that an improperly coded application may simply hang on exit because of one or more threads not shutting down, or not shut down. Of the solutions above I prefer the first one as its downside is less troubling than the downside of the second to me. If someone else sees other solutions I would be happy to hear about them. ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2002-01-11 22:10 Message: Logged In: YES user_id=75003 The main thread breaks out of its event loop only after all worker threads have signaled that they are done. As they still have to talk to the logger thread this means that the main thread can exit while one worker thread is in the later stages of cleanup. Even more so the logger thread is told to terminate just before the main thread truly exits. I guess that it is the logger thread which tries to release his interp after PreserveExitProc was called. ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2002-01-11 21:35 Message: Logged In: YES user_id=75003 Zoran, There is an internal function "TclInExit" (tclEvent.c, line 939). I don't know if we can use this one. Looking at its code I see that it looks for thread-specific data first before returning global information. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2001-12-07 08:12 Message: Logged In: YES user_id=95086 Andreas, I've invested couple of minutes in this issue. It is pretty simple. The main application (tclsh) thread exits before one of the worker threads gets the chance to exit. The main app thread calls PreserveExitProc which finalizes the preserved array and the exiting thread bombs since its preserved data chunk is not found. This is a Tcl core problem, not the thread extension problem. Whole scenario can be avoided by simply putting the "after 1000" at the end of the application to allow worker threads to properly clean-up and terminate. The question is: how can such scenario be avoided ? Is there something like Tcl_InExit which could be checked by *any* thread and just abort any processing if the application is about to (or in the middle of) exit? Zoran ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2001-12-07 06:16 Message: Logged In: YES user_id=95086 Andreas, thanks for offering help. At the moment I'm *realy* busy so if you can jump in on this, it would be great. BTW, I expect to close most of the open stuff, related to docs, etc towards end of the year. I can also attend this problem in the same go, but if you can do it earlier, it would be better. ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2001-12-06 22:08 Message: Logged In: YES user_id=75003 David is more concerned with expect I believe. I am giving this to Zoran now. Zoran, if you don't have time assign it back to me. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2001-11-29 08:16 Message: Logged In: YES user_id=80530 Looks like a bad call to Tcl_Release in the thread package. Is davygrvy still the maintainer of that package, or are his days full with the new Expect port these days? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=486399&group_id=10894 |