From: <er...@he...> - 2003-10-21 20:25:19
|
On Tue, Oct 14, 2003 at 12:30:41PM -0400, Nicholas Henke wrote: > On Tue, 2003-10-14 at 11:56, er...@he... wrote: > > > > Hrm. I'm glad you brought this up. I've recently seen a similar > > problem. I thought it had something to do with the recent network > > upgrade we did. Sounds like probably not. > > Fun ;/ Thanks for taking a look at it. > > > > > It's pretty mysterious to me. BProc really doesn't do much with > > interrupts turned off. I've been working on reproducing it more > > reliably here. > > > > I don't know a good way to shake the kernel loose in that case. One > > thing we wwere going to try was to instrument it a bit with POST codes > > or try and poke around in memory a bit with a bus analyzer. > > Ok -- way over my head there, I wonder if a hardware watchdog card would > help - or if that would give the same results as the nmi_watchdog..aka > nothing. I got an oops out of the NMI watchdog which was enlightening (or at least indicated which code was at fault). The following patch may have fixed it for me. I say "may have" since I've had some trouble reproducing the problem reliably. This patch turns off "sigbypass" which is a little optimization where a process sending a signal to a ghost doesn't bother the ghost. Instead it just throws a signal forwarding message right on the message queue. I'm not sure how the code is broken. I haven't had time to look into it yet. Please give it a try and let me know if you still see the deadlock. --- hooks.c 29 Aug 2003 21:46:57 -0000 1.53 +++ hooks.c 21 Oct 2003 15:41:44 -0000 @@ -314,6 +314,11 @@ * t->sigmask */ struct bproc_krequest_t *req; struct siginfo tmpinfo; + + return 0; /* XXX disable sigbypass for now. + * There seems to be something busted or + * unsafe about this code... */ + if (!BPROC_ISGHOST(t) || !t->bproc.ghost->sigbypass) return 0; |