#71 Lost initnode during boot didn't form a cluster with others

open
nobody
3
2014-08-16
2004-07-31
David Zafman
No

Although the initial cause of the problem is related to old shared
disk hardware, the problem is that a cluster wasn't formed from
the remaining nodes.

I simultaneously booted a 4 node clusters, for whatever reason
node 1 which was selected as the master node crashed. The
remaining nodes are hung in the following states. No cluster
is forming.

Node 2:
Found node 1 as the root node.

ipcnameserver ready completed
This is a CI/OpenSSI kernel.
This Cluster Node: 2
Potential Initnode(s): 1:192.168.0.63,2:192.168.0.64,3:
192.168.0.65,4:192.168.0.66

Name server registered with clms
ipcname_read completed
Mounting root in linuxrc
****HUNG*******

Node 3:
Found node 1 as the root node.
nm_add_node: Node 1 added
ipcnameserver ready Mounting root in linuxrc
completed

This is a CI/OpenSSI kernel.
This Cluster Node: 3
Potential Initnode(s): 1:192.168.0.63,2:192.168.0.64,3:
192.168.0.65,4:192.168.0.66

Name server registered with clms
ipcname_read completed
Kernel panic: Lost CLMS master while trying to join cluster!
****Would have rebooted in production I assume*****
*****After I manually rebooted the development kernel*****
.....
Running preSearching for an existing root node...
-root cluster initialization
Can't contact given root node 1/192.168.0.63, error 5
Can't contact given root node 1/192.168.0.63, error 5
Can't contact given root node 1/192.168.0.63, error 5

Node 4:
Found node 1 as the root node.

ipcnameserver ready completed
This is a CI/OpenSSI kernel.
This Cluster Node: 4
Potential Initnode(s): 1:192.168.0.63,2:192.168.0.64,3:
192.168.0.65,4:192.168.0.66

Name server registered with clms
ipcname_read completed
Mounting root in linuxrc
****HUNG*****

Discussion