Thread: [SSI-devel] CLMS stuck in pre-root initialization on 2-node SSI cluster
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2007-02-18 07:46:48
|
Hi, I ran into the following problem on a 2-node DRBD-SSI cluster. The clms master was failing over, but paniced due to IO errors. At the same time the client was booting but did not detect the root node went down. Last console message was waiting to join cluster. Not good, but hard to reproduce. The clms master must have failed before nodedown daemon spawned or the daemon never got the event. So the client got stuck in an infinite loop somewhere I think in clms_client_sync_masterparams(). Roger --- cluster/clms/clms_client.c 10 Feb 2005 01:05:32 -0000 1.7 +++ cluster/clms/clms_client.c 18 Feb 2007 06:57:02 -0000 @@ -158,7 +158,7 @@ static void clms_client_sync_masterparams(void) { - int error; + int error, tries = 0; int rval; char *masterinfo; int masterinfo_len; @@ -185,11 +185,17 @@ } if (error == -EREMOTE) { nidelay(HZ); + if (++tries > CONFIG_NODE_MONITOR_TIMEOUT_MS / 1000 + 1) { + printk(KERN_WARNING + "Failed reading master list, master went down!\n"); + machine_restart(NULL); + } continue; } if (error >= 0) error = rval; if (error < 0) + /* XXX: Never reached */ panic("%s:Error %d reading master list\n", __FUNCTION__, -error); break; |