|
From: Doug R. <df...@nl...> - 2004-02-14 17:26:28
Attachments:
signal2.c
|
I've been trying to sort out a problem found by one of the people testing the FreeBSD valgrind and it looks like a nice tricky one. They have one thread which blocks itself in pthread_cond_wait. Thats the first problem, valgrind immediately dies thinking there is a deadlock. I can 'fix' that by not panicing if all threads are stuck waiting on CVs. The next part of the test isn't so easy. The user issues a SIGINT (via ^C or similar) which is caught by a signal handler which calls pthread_cond_broadcast. This is supposed to allow the program to exit. Unfortunately, the broadcast is ignored because while the thread is in the signal handler it isn't in WaitCV state. Anyway, the testcase is attached for your amusement. |
|
From: Jeremy F. <je...@go...> - 2004-02-14 18:47:54
|
On Sat, 2004-02-14 at 09:23, Doug Rabson wrote: > I've been trying to sort out a problem found by one of the people > testing the FreeBSD valgrind and it looks like a nice tricky one. They > have one thread which blocks itself in pthread_cond_wait. Thats the > first problem, valgrind immediately dies thinking there is a deadlock. I > can 'fix' that by not panicing if all threads are stuck waiting on CVs. > > The next part of the test isn't so easy. The user issues a SIGINT (via > ^C or similar) which is caught by a signal handler which calls > pthread_cond_broadcast. This is supposed to allow the program to exit. > Unfortunately, the broadcast is ignored because while the thread is in > the signal handler it isn't in WaitCV state. > > Anyway, the testcase is attached for your amusement. I was wondering when people would try something insane like this. The deadlock test isn't completely correct, because as you say, most blocking functions can be interrupted by a signal (though is it OK to use pthread_cond_broadcast in a signal handler?). >From the scheduler's perspective, it's hard to see how you can come up with a sane notion of being both blocked in a CV and actually executing instructions at the same time... J |
|
From: Doug R. <df...@nl...> - 2004-02-14 19:18:06
|
On Sat, 2004-02-14 at 18:45, Jeremy Fitzhardinge wrote: > On Sat, 2004-02-14 at 09:23, Doug Rabson wrote: > > I've been trying to sort out a problem found by one of the people > > testing the FreeBSD valgrind and it looks like a nice tricky one. They > > have one thread which blocks itself in pthread_cond_wait. Thats the > > first problem, valgrind immediately dies thinking there is a deadlock. I > > can 'fix' that by not panicing if all threads are stuck waiting on CVs. > > > > The next part of the test isn't so easy. The user issues a SIGINT (via > > ^C or similar) which is caught by a signal handler which calls > > pthread_cond_broadcast. This is supposed to allow the program to exit. > > Unfortunately, the broadcast is ignored because while the thread is in > > the signal handler it isn't in WaitCV state. > > > > Anyway, the testcase is attached for your amusement. > > I was wondering when people would try something insane like this. The > deadlock test isn't completely correct, because as you say, most > blocking functions can be interrupted by a signal (though is it OK to > use pthread_cond_broadcast in a signal handler?). > > >From the scheduler's perspective, it's hard to see how you can come up > with a sane notion of being both blocked in a CV and actually executing > instructions at the same time... Still, this behaviour works 'correctly' on FreeBSD, Linux, Solaris and probably most other pthreads implementations. This testcase is a distillation of how they get their application to shutdown cleanly. The main problem is that while it runs the signal handler, the thread isn't in WaitCV state (so the pthread_cond_broadcast doesn't do anything) but when the signal frame is popped, it goes right back to sleep. Somehow I need to be able to tweak the saved thread state in the signal frame (or something). |
|
From: Jeremy F. <je...@go...> - 2004-02-16 07:01:09
|
On Sat, 2004-02-14 at 11:15, Doug Rabson wrote: > Still, this behaviour works 'correctly' on FreeBSD, Linux, Solaris and > probably most other pthreads implementations. This testcase is a > distillation of how they get their application to shutdown cleanly. > > The main problem is that while it runs the signal handler, the thread > isn't in WaitCV state (so the pthread_cond_broadcast doesn't do > anything) but when the signal frame is popped, it goes right back to > sleep. Somehow I need to be able to tweak the saved thread state in the > signal frame (or something). I think the problem is that Valgrind is conflating two levels of abstraction. There's the core scheduler's notion of a thread being "running" or "blocked", and there's the pthreads library's view of what state the thread is in with respect to pthreads objects. I assume that "real" pthreads blocks in a syscall whenever it can be interrupted by a signal, and the syscall restart stuff (or whatever) decides what to do if it has been interrupted, what the new state of the pthreads object is, etc. Which raises a question: what happens if a thread blocks in a CV, is interrupted by a signal, and the signal handler does a longjmp to somewhere else? What state is the CV left in? J |
|
From: Doug R. <df...@nl...> - 2004-02-16 09:35:27
|
On Mon, 2004-02-16 at 06:57, Jeremy Fitzhardinge wrote: > On Sat, 2004-02-14 at 11:15, Doug Rabson wrote: > > Still, this behaviour works 'correctly' on FreeBSD, Linux, Solaris and > > probably most other pthreads implementations. This testcase is a > > distillation of how they get their application to shutdown cleanly. > > > > The main problem is that while it runs the signal handler, the thread > > isn't in WaitCV state (so the pthread_cond_broadcast doesn't do > > anything) but when the signal frame is popped, it goes right back to > > sleep. Somehow I need to be able to tweak the saved thread state in the > > signal frame (or something). > > I think the problem is that Valgrind is conflating two levels of > abstraction. There's the core scheduler's notion of a thread being > "running" or "blocked", and there's the pthreads library's view of what > state the thread is in with respect to pthreads objects. I assume that > "real" pthreads blocks in a syscall whenever it can be interrupted by a > signal, and the syscall restart stuff (or whatever) decides what to do > if it has been interrupted, what the new state of the pthreads object > is, etc. I have about half an answer (which I haven't thought through) where the actual state transition from WaitCV to Runnable is made by the scheduler (i.e. in the context of the waiting thread) rather than by the signaller. It would go something like this: If a thread is in WaitCV but has no associated_cv If associated_mx is not held lock associated_mx clear associated_mx state is Runnable else state is WaitMX Then in release_N_threads_waiting_on_cond, just check the value of associated_cv (not the thread state). It would simply clear the associated_cv variable and leave the mutex manipulation to the scheduler. It would mean being squeaky clean about nulling associated_cv when not waiting on a cond but I think that is done already. > > Which raises a question: what happens if a thread blocks in a CV, is > interrupted by a signal, and the signal handler does a longjmp to > somewhere else? What state is the CV left in? A silly one. FreeBSD's libc_r woule probably have the thread still on the cond's queue from a quick look at the code. |
|
From: Doug R. <df...@nl...> - 2004-02-16 11:51:10
|
On Mon, 2004-02-16 at 09:31, Doug Rabson wrote:
> On Mon, 2004-02-16 at 06:57, Jeremy Fitzhardinge wrote:
> > On Sat, 2004-02-14 at 11:15, Doug Rabson wrote:
> > > Still, this behaviour works 'correctly' on FreeBSD, Linux, Solaris and
> > > probably most other pthreads implementations. This testcase is a
> > > distillation of how they get their application to shutdown cleanly.
> > >
> > > The main problem is that while it runs the signal handler, the thread
> > > isn't in WaitCV state (so the pthread_cond_broadcast doesn't do
> > > anything) but when the signal frame is popped, it goes right back to
> > > sleep. Somehow I need to be able to tweak the saved thread state in the
> > > signal frame (or something).
> >
> > I think the problem is that Valgrind is conflating two levels of
> > abstraction. There's the core scheduler's notion of a thread being
> > "running" or "blocked", and there's the pthreads library's view of what
> > state the thread is in with respect to pthreads objects. I assume that
> > "real" pthreads blocks in a syscall whenever it can be interrupted by a
> > signal, and the syscall restart stuff (or whatever) decides what to do
> > if it has been interrupted, what the new state of the pthreads object
> > is, etc.
>
> I have about half an answer (which I haven't thought through) where the
> actual state transition from WaitCV to Runnable is made by the scheduler
> (i.e. in the context of the waiting thread) rather than by the
> signaller. It would go something like this:
>
> If a thread is in WaitCV but has no associated_cv
> If associated_mx is not held
> lock associated_mx
> clear associated_mx
> state is Runnable
> else
> state is WaitMX
>
> Then in release_N_threads_waiting_on_cond, just check the value of
> associated_cv (not the thread state). It would simply clear the
> associated_cv variable and leave the mutex manipulation to the
> scheduler. It would mean being squeaky clean about nulling associated_cv
> when not waiting on a cond but I think that is done already.
I'm going with this change for now (which at least gets my tester's
program running properly). Sorry about the FreeBSD-isms - this diff
isn't taken from the valgrind cvs.
--- valgrind/branches/dfr/coregrind/vg_scheduler.c 2004-02-12
20:13:40 UTC (rev 270)
+++ valgrind/branches/dfr/coregrind/vg_scheduler.c 2004-02-16
11:43:41 UTC (rev 271)
@@ -31,6 +31,7 @@
#include "valgrind.h" /* for VG_USERREQ__RUNNING_ON_VALGRIND and
VG_USERREQ__DISCARD_TRANSLATIONS, and
others */
#include "vg_include.h"
+#include <pthread.h>
/* BORKAGE/ISSUES as of 29 May 02
@@ -134,6 +135,16 @@
static void scheduler_sanity ( void );
static void do_pthread_cond_timedwait_TIMEOUT ( ThreadId tid );
+#ifdef __linux__
+#define MUTEX(mutex) (mutex)
+#endif
+#ifdef __FreeBSD__
+#define MUTEX(mutex) (*mutex)
+typedef int _pthread_descr;
+#define PTHREAD_MUTEX_ERRORCHECK_NP PTHREAD_MUTEX_ERRORCHECK
+#define PTHREAD_MUTEX_RECURSIVE_NP PTHREAD_MUTEX_RECURSIVE
+#endif
+
/*
---------------------------------------------------------------------
Helper functions for the scheduler.
------------------------------------------------------------------
*/
@@ -916,6 +927,38 @@
while (True) {
tid_next++;
if (tid_next >= VG_N_THREADS) tid_next = 1;
+
+ /* Deal with threads woken up from waiting on a cond. */
+ if (VG_(threads)[tid_next].status == VgTs_WaitCV
+ && VG_(threads)[tid_next].associated_cv == NULL) {
+ pthread_mutex_t* mx;
+
+ mx = VG_(threads)[tid_next].associated_mx;
+ vg_assert(mx != NULL);
+
+ VG_TRACK( pre_mutex_lock, tid_next, mx );
+ if (MUTEX(mx)->__m_owner == VG_INVALID_THREADID) {
+ /* Currently unheld; hand it out to thread i. */
+ vg_assert(MUTEX(mx)->__m_count == 0);
+ VG_(threads)[tid_next].status = VgTs_Runnable;
+ VG_(threads)[tid_next].associated_cv = NULL;
+ VG_(threads)[tid_next].associated_mx = NULL;
+ MUTEX(mx)->__m_owner = (_pthread_descr)tid_next;
+ MUTEX(mx)->__m_count = 1;
+ /* .m_edx already holds pth_cond_wait success value
(0) */
+
+ VG_TRACK( post_mutex_lock, tid_next, mx );
+
+ } else {
+ /* Currently held. Make thread i be blocked on it. */
+ vg_assert(MUTEX(mx)->__m_count > 0);
+ VG_(threads)[tid_next].status = VgTs_WaitMX;
+ VG_(threads)[tid_next].associated_cv = NULL;
+ VG_(threads)[tid_next].associated_mx = mx;
+ SET_PTHREQ_RETVAL(tid_next, 0); /* pth_cond_wait
success value */
+ }
+ }
+
if (VG_(threads)[tid_next].status == VgTs_Sleeping
|| VG_(threads)[tid_next].status == VgTs_WaitSys
|| VG_(threads)[tid_next].status == VgTs_WaitCV)
@@ -1312,7 +1355,6 @@
The pthread implementation.
------------------------------------------------------------------
*/
-#include <pthread.h>
#include <errno.h>
#define VG_PTHREAD_STACK_MIN \
@@ -1999,16 +2041,6 @@
deals with that for us.
*/
-#ifdef __linux__
-#define MUTEX(mutex) (mutex)
-#endif
-#ifdef __FreeBSD__
-#define MUTEX(mutex) (*mutex)
-typedef int _pthread_descr;
-#define PTHREAD_MUTEX_ERRORCHECK_NP PTHREAD_MUTEX_ERRORCHECK
-#define PTHREAD_MUTEX_RECURSIVE_NP PTHREAD_MUTEX_RECURSIVE
-#endif
-
/* Helper fns ... */
static
void release_one_thread_waiting_on_mutex ( pthread_mutex_t* mutex,
@@ -2355,8 +2387,7 @@
for (i = 1; i < VG_N_THREADS; i++) {
if (VG_(threads)[i].status == VgTs_Empty)
continue;
- if (VG_(threads)[i].status == VgTs_WaitCV
- && VG_(threads)[i].associated_cv == cond)
+ if (VG_(threads)[i].associated_cv == cond)
break;
}
vg_assert(i <= VG_N_THREADS);
@@ -2366,43 +2397,22 @@
return;
}
+ VG_(threads)[i].associated_cv = NULL;
mx = VG_(threads)[i].associated_mx;
vg_assert(mx != NULL);
- VG_TRACK( pre_mutex_lock, i, mx );
-
if (MUTEX(mx)->__m_owner == VG_INVALID_THREADID) {
- /* Currently unheld; hand it out to thread i. */
- vg_assert(MUTEX(mx)->__m_count == 0);
- VG_(threads)[i].status = VgTs_Runnable;
- VG_(threads)[i].associated_cv = NULL;
- VG_(threads)[i].associated_mx = NULL;
- MUTEX(mx)->__m_owner = (_pthread_descr)i;
- MUTEX(mx)->__m_count = 1;
- /* .m_edx already holds pth_cond_wait success value (0) */
-
- VG_TRACK( post_mutex_lock, i, mx );
-
if (VG_(clo_trace_pthread_level) >= 1) {
VG_(sprintf)(msg_buf, "%s cv %p: RESUME with mx %p",
caller, cond, mx );
print_pthread_event(i, msg_buf);
}
-
} else {
- /* Currently held. Make thread i be blocked on it. */
- vg_assert(MUTEX(mx)->__m_count > 0);
- VG_(threads)[i].status = VgTs_WaitMX;
- VG_(threads)[i].associated_cv = NULL;
- VG_(threads)[i].associated_mx = mx;
- SET_PTHREQ_RETVAL(i, 0); /* pth_cond_wait success value */
-
if (VG_(clo_trace_pthread_level) >= 1) {
VG_(sprintf)(msg_buf, "%s cv %p: BLOCK for mx %p",
caller, cond, mx );
print_pthread_event(i, msg_buf);
}
-
}
n_to_release--;
@@ -3359,7 +3369,6 @@
#endif
} else
if (VG_(threads)[i].status == VgTs_WaitCV) {
- vg_assert(cv != NULL);
vg_assert(mx != NULL);
} else {
/* Unfortunately these don't hold true when a sighandler is
|