|
From: Julian S. <js...@ac...> - 2006-07-06 14:22:10
|
There's an interesting conceptual hole in Valgrind's handling of client syscalls in the presence of threads, which may have practical consequences. I don't know what to do about it. Anyway: V allows native thread libraries to run, but serialises execution using a pipe-based lock (in scheduler/sema.c), so that only one thread ever runs at once, even on SMPs. The general case for handling a syscall is: do all preparations for the syscall drop the lock do the syscall reacquire the lock Dropping the lock is necessary so that other thread(s) can get the lock and hence run, if this syscall should block. Not doing so rapidly leads to deadlocks. Dropping the lock (and associated sigmask futzing) is expensive, and so for syscalls we're sure won't block (eg getpid) there is a cheaper route: do all preparations for the syscall do the syscall ie, the same except we retain the lock the whole time. --- You'd think the general case is always safe to use, but not so. Consider a syscall which causes some other thread to die instantly, which is probably possible with thread_kill(tid, 9) and is certainly possible on AIX. Then the slow route gives a possible sequence: (target thread: waiting to acquire the lock) (running thread has the lock) running thread: do all preparations for the syscall running thread: drop the lock target thread: acquire the lock running thread: do the syscall (target thread dies) running thread: (try to) reacquire the lock Now we're hosed; the target thread picked up the lock and then died. This really happens, especially on SMPs where the target thread has a good opportunity to run the instant the lock is dropped. --- So: the moral is: if doing a syscall which may cause _some other_ thread to die, I must retain the lock to avoid such a deadlock. OTOH, if doing a syscall which may cause _me_ to die, I must release the lock so that others may pick it up in that case. So what do we do for a syscall which causes some arbitrarily (kernel-) chosen thread in the process to die? Do any such syscalls exist? All very ugly. Our locking strategy imposes the non-obvious requirement of knowing how each syscall affects the liveness of each thread in the process. J |
|
From: Tom H. <to...@co...> - 2006-07-06 14:48:34
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> So: the moral is: if doing a syscall which may cause _some other_
> thread to die, I must retain the lock to avoid such a deadlock.
> OTOH, if doing a syscall which may cause _me_ to die, I must release
> the lock so that others may pick it up in that case.
But all syscalls may cause you to die it you allow for being kill -9'd
in the middle of one... I guess that is only likely for a blocking
call?
> So what do we do for a syscall which causes some arbitrarily (kernel-)
> chosen thread in the process to die? Do any such syscalls exist?
That would be an evil system call.
> All very ugly. Our locking strategy imposes the non-obvious
> requirement of knowing how each syscall affects the liveness of
> each thread in the process.
I don't believe any threaded program is robust in the fact of hard
killing of threads in this way - even without valgrind you would be
hosed if the thread that was killed held a lock at the time.
Interesting, on linux at least there is a possible solution - if we
were using futexes in stead of the pipe (there was such an
implementation at one point) then we could use the new robust
futexes stuff to make sure the lock is released if it was held
by a thread that died.
I'm not sure if that is safe though? Is their any guarantee that the
valgrind internal data structure are safe to handle in such a
circumstance?
Catching SIGCHLD would potentially allow us to do something similar
but would face similar problems about data structure state.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|