Menu

#49 Nodedown daemons hang due to caredata

open
3
2004-04-20
2004-04-20
No

CLMS failover daemon was hung trying to get a vproc
lock for pid 2, pid 2 was waiting in an RPC to the down
node (node1) because is was reaping a local process,
however its pgrp was a node 1 pid so it had caredata
that it needed to cleanup. ICS nodedown would make the
rpc return -EREMOTE and let the daemon continue,
however, ics nodedown cant get spawned.

Possible solution is to not make ics nodedown
processing a spawn proc, instead its part of the
regular thread and it would get called before the
spawning of the other failover/nodedown threads.

Bug #938838 is a fix for init to avoid this problem.
However,
any user command doing an rexec to another node which
creates kernel daemons that are exiting during a
nodedown may run into this bug.

Discussion


Log in to post a comment.