|
From: Julian S. <js...@ac...> - 2006-07-06 14:22:10
|
There's an interesting conceptual hole in Valgrind's handling of client syscalls in the presence of threads, which may have practical consequences. I don't know what to do about it. Anyway: V allows native thread libraries to run, but serialises execution using a pipe-based lock (in scheduler/sema.c), so that only one thread ever runs at once, even on SMPs. The general case for handling a syscall is: do all preparations for the syscall drop the lock do the syscall reacquire the lock Dropping the lock is necessary so that other thread(s) can get the lock and hence run, if this syscall should block. Not doing so rapidly leads to deadlocks. Dropping the lock (and associated sigmask futzing) is expensive, and so for syscalls we're sure won't block (eg getpid) there is a cheaper route: do all preparations for the syscall do the syscall ie, the same except we retain the lock the whole time. --- You'd think the general case is always safe to use, but not so. Consider a syscall which causes some other thread to die instantly, which is probably possible with thread_kill(tid, 9) and is certainly possible on AIX. Then the slow route gives a possible sequence: (target thread: waiting to acquire the lock) (running thread has the lock) running thread: do all preparations for the syscall running thread: drop the lock target thread: acquire the lock running thread: do the syscall (target thread dies) running thread: (try to) reacquire the lock Now we're hosed; the target thread picked up the lock and then died. This really happens, especially on SMPs where the target thread has a good opportunity to run the instant the lock is dropped. --- So: the moral is: if doing a syscall which may cause _some other_ thread to die, I must retain the lock to avoid such a deadlock. OTOH, if doing a syscall which may cause _me_ to die, I must release the lock so that others may pick it up in that case. So what do we do for a syscall which causes some arbitrarily (kernel-) chosen thread in the process to die? Do any such syscalls exist? All very ugly. Our locking strategy imposes the non-obvious requirement of knowing how each syscall affects the liveness of each thread in the process. J |