From: <er...@he...> - 2003-10-14 16:24:13
|
On Mon, Oct 13, 2003 at 02:17:16PM -0400, Nicholas Henke wrote: > Howdy~ > I am back to torturing bproc again, just trying to make sure a kernel > upgrade is going to be stable. Attached is a tar.gz with a script to run > remote_fork (.c included). There is a 'NODES=' section at the top to > edit for your nodes to use. > > If you do a './run.sh 10000000' ( 10 million iterations ), at some > point, usually 1.5 million, the head node will hard lock -- not even > nmi_watchdog can rescue it. > > If you have a way to rescue a kernel from this hard of a lock, I would > love to know about it, so I could give this bug a whirl, but otherwise I > am pretty stuck. Hrm. I'm glad you brought this up. I've recently seen a similar problem. I thought it had something to do with the recent network upgrade we did. Sounds like probably not. It's pretty mysterious to me. BProc really doesn't do much with interrupts turned off. I've been working on reproducing it more reliably here. I don't know a good way to shake the kernel loose in that case. One thing we wwere going to try was to instrument it a bit with POST codes or try and poke around in memory a bit with a bus analyzer. Can you resent the attachment? I didn't get it for some reason. - Erik |