|
From: Julian S. <js...@ac...> - 2005-05-24 10:11:11
|
Just before 2.4.0 went out, there was a long discussion about how thread exiting should work, and possible deadlocking that could result. In the end we settled on an inherently deadlockful scheme (master thread waits for everybody else) but modified the getppid wrapper so as to sidestep the deadlocks. Unfortunately the problem is back, in a hard-to-reproduce way. It afflicts both 2.4.0 and the 3 line. Reproducing it requires a machine with a Quadrics Elan3 network card and the relevant user-space driver and (presumably) kernel module. When a program using this driver starts up, it creates a child thread using clone. No problem. The child hangs around and basically doesn't do anything much (purpose is unclear, but that doesn't matter). It calls a custom ioctl which communicates with the Elan3 kernel module. The ioctl doesn't return until (I assume) the parent thread tells the kernel module that it is done with the card. The ioctl returns and the child exits. Hence the child waits for the parent to exit, then exits itself. Running on V, the result is a deadlock at exit since now we also have that the parent, being the master thread, is waiting for the child to exit. Suggestions on how to fix this? I've been playing with a hacked version of the head, which implements the "last-one-out" strategy we discussed before. I haven't got it working reliably yet, though. J |