Need split brain avoidance
Brought to you by:
brucewalker,
rogertsang
While doing testing, I ended up causing the node 1 the
CLMS master kernel to loop with interrupts disabled (or
something which caused it not to handle node monitoring
activity). Node 2 declared node 1 down and took over
the root filesystem. Node 1 finished its bad activity
and declared node 2 down because node 2 wasn't sending
IMALIVE anymore.
Now that we are getting into HA failover clusters with
shared disks, we need some kind of split-brain
avoidance mechanism.
Logged In: NO
Maybe we can require serial null modem cable and possibly
secondary network connection and use STONITH and heartbeat
algorithms to detect split brain.
Logged In: YES
user_id=495208
Perhaps doing SCSI reserve for shared disk may help that kind
problems. For example node, which have using shared disk, may
try doing reserve once per second. And if that fails, doing
panic.
Or is this just a stupid idea ?
Logged In: NO
I thought about using a null modem cable, too. I wrote a
simple server that listen to the serial port and send a
response, if it is still alive when the other node
connected to the Storage asks. This is kind of simple. But
I thing there is more to do than ask if the server is still
alive. The question is what to do, if it is alive? Should
the primary initnode continue to provide the root fs or
should the second node take over and shot the first one?
What if it is the network connection of the primary node?
In this case an failover of the second node would be
correct. But if it is the network connection of the second
node and it would take over the cluster would only consist
of this node. If there are only two nodes this problem does
not exist. Have someone some ideas what the server should
check, before he sends the second node to don't failover?
I thought about a configuration where every initnode has 2
network interfaces and a seriell nullmodem connection. If
the second initnode think that the first one is down, it
asks over the seriell connection if the server is still
alive. This server tries to ping his primary network
interface over his second one to test himself. (The problem
with this is that the first and second network cable must
be the same) If everything is allright, it response to the
second node to shut itself down. Maybe a test, if the first
and/or second network interface of the second node is
reachable whould be of interest.
But what to do if there are more than 2 node? If the first
server is still alive it should test, if there are any
other nodes in the cluster it sees. If there are, the
second node sould shut down. It would be good if the second
node could allthough test which nodes in the cluster it
still sees. But I think that would be a lot more difficult,
because the second node does not have a root fs at that
moment and is not able to call for example 'ping'.
I am only writing down, what I was thinking about the
problem. I hope that somebody give me a response, what he
think about it or what could be another solution for the
problem.
Andreas (roos@convis.de
Logged In: YES
user_id=495208
IMHO, important resources are
1) Access to shared disk (ie. is SCSI or SAN connection broken)
2) Access to clusters internal network connection
If either of these are lost, node should stop shared disk
I/O, send
go ahead and reboot.
Otherwise node can send 'don't failover'. (if external
interface is
down, services provided to external world need to be relocated
however or trafic routed via clusters internal network)
Some systems also uses SCSI ping to look is other node alive.
That is valuable because if other node have no disk access, it
can not corrupt shared disk (but may corrupt if access to disk
returns -- therefore also SCSI reserve often is used to shut
down
possible disk access.)
Logged In: NO
We need to establish quorum in a cluster. However in a
special case of 2-node cluster we need to fence the other
node. Lets assume we already fence the other node, there
can still be a problem.
Since SSI also supports DRBD root failover, we also want to
handle DRBD split-brain scenario. Node 2 takes over CLMS
master and root fs. Node 1 is rebooting. Then node 2
crashes before node1 recovers. Now both DRBD nodes think
they have the most current data set. Node1 should check
whether the root fs was taken over by another node and if
not whether it was in sync with node2 and decide whether to
fence itself or continue to take over the root fs.
One way of doing this is storing a timestamp that indicates
when it last took over root. When DRBD detects split-brain,
they exchange info before changing from Secondary to Primary
state and before mounting root. The node with the newest
timestamp recovers the root fs.
-Roger
Logged In: YES
user_id=1246761
Originator: NO
Probably worth integrating heartbeat v2 from Linux-HA project with OpenSSI CLMS.