From: Walker, Bruce J <bruce.walker@hp...> - 2005-03-20 06:29:15
This panic is a result of some OpenSSI subsystem on the initnode being
hung in nodedown processing. Panicing is clearly not desirable here but
the goal is to determine the bug causing the hang. If we take out the
panic, other things will start hanging later. If you can, through kdb,
find some hung nodedown kernel threads and record their stacks, that
might be helpful.
Sorry for your experience and thanks for the help,
> -----Original Message-----
> From: ssic-linux-users-admin@...=20
> [mailto:ssic-linux-users-admin@...] On=20
> Behalf Of Ivan Krstic
> Sent: Friday, March 18, 2005 11:28 PM
> To: OpenSSI Users
> Subject: [SSI-users] Kernel panic: timed out waiting for nodedown
> I've just dealt with an evil OpenSSI problem. One of the nodes in our=20
> LTSP cluster went down, causing the initnode to kernpanic with error=20
> 'Timed out waiting for nodedown'. Needless to say, this pulled the=20
> entire cluster offline, cutting off service to thin clients.
> A quick scan of the code does not reveal an explanation. Do the=20
> developers know why this could have happened? It seems to me that a=20
> peripheral node's nodedown not completing should under no=20
> be allowed to kernpanic the initnode, since that would nullify the=20
> entire purpose of high availability.
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from=20
> real users.
> Discover which products truly live up to the hype. Start reading now.
> Ssic-linux-users mailing list