From: SourceForge.net <no...@so...> - 2009-11-02 02:57:09
|
Bugs item #2869384, was opened at 2009-09-28 18:33 Message generated for change (Comment added) made by mistachkin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2869384&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 04. Async Events Group: obsolete: 8.5.5 Status: Open Resolution: None Priority: 6 Private: No Submitted By: Yevgen Ryazanov (eugene_cdn) Assigned to: Joe Mistachkin (mistachkin) Summary: Tcl_AsyncMark does not work with --enable-threads Initial Comment: OS: sparc Solaris 10 Tcl_AsynMark cannot be called in a signal handler because it locks the same mutex (notifierMutex) as Tcl_WaitForEvent. If a signal is handled in main thread when it serves Tcl events, then deadlock is very probable. Attached example will produce a stack like this ----------------- lwp# 1 / thread# 1 -------------------- feb40408 lwp_park (0, 0, 0) ff0385fc Tcl_MutexLock (ff08247c, 0, 0, 1000, 0, 0) + b4 ff039624 Tcl_AlertNotifier (2bda8, 0, 0, 1000, 0, 0) + 34 fefd83a8 Tcl_ThreadAlert (1, 0, 0, 0, 0, 0) + a8 feedf7e0 Tcl_AsyncMark (1a80d8, 0, 0, 0, 0, 0) + 60 fee306a8 GotAlarm (e, 0, ffbfeb10, 1, 21c30, 21c2c) + 30 feb40494 __sighndlr (e, 0, ffbfeb10, fee30678, 0, 1) + c feb3558c call_user_handler (e, 0, 4, 0, ff0c2000, ffbfeb10) + 3b8 feb4187c _write (4, ff05ce88, 1, 0, 0, 2c0b5) + c ff03a2f0 Tcl_WaitForEvent (224e8, fffffffd, ff1966e4, ff3ee0f8, ff3f06e0, 0) + 3c0 fefd7f74 Tcl_DoOneEvent (0, 6e7f8, 0, 0, 1, 0) + 1ec ff17c738 Tk_MainLoop (2c1a8, 31188, 0, 1, 1, 5400) + 48 ff1966e4 Tk_MainEx (2, ffbff2ac, 10e90, 2c1a8, 4, 4) + b1c 00010e60 main (2, ffbff2ac, ffbff2b8, 21000, ff0c0100, ff0c0140) + 50 000109e0 _start (0, 0, 0, 0, 0, 0) + 108 ----------------- lwp# 2 / thread# 2 -------------------- feb40408 lwp_park (0, 0, 0) ff0385fc Tcl_MutexLock (ff08247c, fea7bf1c, fea7be9c, fea7be1c, 0, fea7bea5) + b4 ff03a8e4 NotifierThreadProc (0, fea7c000, 0, 0, ff03a4a8, 1) + 43c feb40368 _lwp_start (0, 0, 0, 0, 0, 0) in a few seconds. To reproduce the problem, use regular wish built with --enable-threads and use attached script as input: wish run.tcl It will hang shortly. To speed up hanging, one can decrease timeouts. Tk must serve some events to make the application hanging faster, so the script changes text widget in loop. Build script is attached (need to modify paths). ---------------------------------------------------------------------- >Comment By: Joe Mistachkin (mistachkin) Date: 2009-11-01 18:56 Message: The extensive use of locking in the Unix notifier seems to be the cause of this issue? I believe that with careful analysis of the locks in the Unix notifier they can be minimized. Of special concern is waiting for (an|any) event while holding any locks because we have no way of knowing precisely when an event will be triggered. ---------------------------------------------------------------------- Comment By: Yevgen Ryazanov (eugene_cdn) Date: 2009-10-28 13:47 Message: Any update? It is not a minor issue. One of the oldest group of functions does not work in the situation it was primarily designed for. And there is no workaround. ---------------------------------------------------------------------- Comment By: Yevgen Ryazanov (eugene_cdn) Date: 2009-09-29 05:46 Message: I don't think that recursive mutex will help. Cannot modify global data safely in a signal handler without be sure that you are alone. Agree that the problem cannot be solved without help of another thread that does not receive signals. Please also note that there may be several signals come before "end of special handling". Draft idea: Need a thread that blocks all signals (except, maybe one, usr123) or does not use notifierMutex. Let's call it right_thread. AsyncMark: - if (trylock(¬ifierMutex) == 0) - // normal code - lock (&another_mutex) - queue async_tls_data - unlock (&another_mutex) - wake up right_thread (using pthread_kill(right_thread,usr123) or pthread_cond_signal) right_thread: - lock (&another_mutex) - take asyn_tls_data from queue - save async_tls_data to right interp - unlock (&another_mutex) There is a non-blocking technique to modify global data. May also be used in any solution. ---------------------------------------------------------------------- Comment By: Konstantin Khomoutov (flatworm) Date: 2009-09-29 05:19 Message: It seems this function is actually named pthread_mutex_trylock(). The idea is interesting, but I don't quite get how to implement it: In the defautl mode of operation pthread_mutex_trylock() is defined to return immediately with EBUSY if the mutex lock is held, and Tcl core doesn't expect this behaviour from the mutex API it uses (and exposes). If we instead make all mutexes "recursive" (in terms of pthreads), this will fix deadlocks occuring in the same thread because of signal handling but will break expectations in all other cases when a mutex locking function is supposed to wait on a mutex already locked by some other thread. It need further thinking (and taking into account threading subsystem on Windows) as Tcl mutex API hides platform details from the programmer. ---------------------------------------------------------------------- Comment By: Yevgen Ryazanov (eugene_cdn) Date: 2009-09-29 04:35 Message: pthread_trylock may help (in addition with global flag or something). I don't have clear idea though. ---------------------------------------------------------------------- Comment By: Konstantin Khomoutov (flatworm) Date: 2009-09-29 04:27 Message: The problem with this approach is that it implies we control all the threads existing when our setup code is beging run and postulate the policy than there shall never be a thread created which will have singals unmasked, except the signal-processing thread. I reckon this is impossible to achieve from an extension implementing signal handling, and can only possibly be done if integrated directly in the core, which is questionnable, as the concept of signals only exists on Unix. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2009-09-29 01:54 Message: I seem to recall (see that 12-year old paper http://www.linuxjournal.com/article/2121 ) that the recommended way of doing pthread+signals is to block all signals in all threads except one, thus effectively guiding signals to one specific thread that you control. Is this still working in modern pthreads ? ---------------------------------------------------------------------- Comment By: Konstantin Khomoutov (flatworm) Date: 2009-09-29 01:17 Message: I've hit this while implementing an extension to handle POSIX signals (it's unfinished yet so not released). The problem appears to be more complex and I doubt it can be fixed at all for all possible cases of a deadlock, simply because pthread's mutex is allowed to deadlock when locked twice from the same thread, and that's what it does on Linux 2.6 at least. The problem is that the signal handler bumps into the thread's execution stack at unpredictable moments, and hence the signal handling can be considered as a "superthread" which don't play by the usual rules for threads. Conceptually, to make Tcl_AsyncMark() (and any other function which tries to lock the target thread's LTS) not deadlock, we have to make locking function only attempt locking if some flag is not raised (meaning that no lock is held). But using a flag means sharing a mutable state, which inherently implies using another mutex, so we're back at the starting point. Masking all signals before locking the LTS mutex is also a bad idea: a) too many syscalls b) this doesn't help to prevent mutual deadlocks between two threads. So, in my package mentioned above I finally implemented a solution involving a special manager thread with which signal handlers interact and which dispatches events to target threads. The solution is complicated but works. As I intend to release the code under a Tcl-like lisence, I will be happy to share it, if you so wish. ---------------------------------------------------------------------- Comment By: Yevgen Ryazanov (eugene_cdn) Date: 2009-09-28 18:54 Message: Tried on Linux, Red Hat 4 update 4. The same problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2869384&group_id=10894 |