Menu

#3 Need split brain avoidance

open
nobody
None
6
2014-08-25
2004-04-14
No

While doing testing, I ended up causing the node 1 the
CLMS master kernel to loop with interrupts disabled (or
something which caused it not to handle node monitoring
activity). Node 2 declared node 1 down and took over
the root filesystem. Node 1 finished its bad activity
and declared node 2 down because node 2 wasn't sending
IMALIVE anymore.

Now that we are getting into HA failover clusters with
shared disks, we need some kind of split-brain
avoidance mechanism.

Discussion

  • Nobody/Anonymous

    Logged In: NO

    Maybe we can require serial null modem cable and possibly
    secondary network connection and use STONITH and heartbeat
    algorithms to detect split brain.

     
  • Kari Hurtta

    Kari Hurtta - 2004-04-16

    Logged In: YES
    user_id=495208

    Perhaps doing SCSI reserve for shared disk may help that kind
    problems. For example node, which have using shared disk, may
    try doing reserve once per second. And if that fails, doing
    panic.

    Or is this just a stupid idea ?

     
  • Nobody/Anonymous

    Logged In: NO

    I thought about using a null modem cable, too. I wrote a
    simple server that listen to the serial port and send a
    response, if it is still alive when the other node
    connected to the Storage asks. This is kind of simple. But
    I thing there is more to do than ask if the server is still
    alive. The question is what to do, if it is alive? Should
    the primary initnode continue to provide the root fs or
    should the second node take over and shot the first one?
    What if it is the network connection of the primary node?
    In this case an failover of the second node would be
    correct. But if it is the network connection of the second
    node and it would take over the cluster would only consist
    of this node. If there are only two nodes this problem does
    not exist. Have someone some ideas what the server should
    check, before he sends the second node to don't failover?

    I thought about a configuration where every initnode has 2
    network interfaces and a seriell nullmodem connection. If
    the second initnode think that the first one is down, it
    asks over the seriell connection if the server is still
    alive. This server tries to ping his primary network
    interface over his second one to test himself. (The problem
    with this is that the first and second network cable must
    be the same) If everything is allright, it response to the
    second node to shut itself down. Maybe a test, if the first
    and/or second network interface of the second node is
    reachable whould be of interest.

    But what to do if there are more than 2 node? If the first
    server is still alive it should test, if there are any
    other nodes in the cluster it sees. If there are, the
    second node sould shut down. It would be good if the second
    node could allthough test which nodes in the cluster it
    still sees. But I think that would be a lot more difficult,
    because the second node does not have a root fs at that
    moment and is not able to call for example 'ping'.

    I am only writing down, what I was thinking about the
    problem. I hope that somebody give me a response, what he
    think about it or what could be another solution for the
    problem.

    Andreas (roos@convis.de

     
  • Kari Hurtta

    Kari Hurtta - 2004-04-23

    Logged In: YES
    user_id=495208

    IMHO, important resources are
    1) Access to shared disk (ie. is SCSI or SAN connection broken)
    2) Access to clusters internal network connection

    If either of these are lost, node should stop shared disk
    I/O, send
    go ahead and reboot.

    Otherwise node can send 'don't failover'. (if external
    interface is
    down, services provided to external world need to be relocated
    however or trafic routed via clusters internal network)

    Some systems also uses SCSI ping to look is other node alive.
    That is valuable because if other node have no disk access, it
    can not corrupt shared disk (but may corrupt if access to disk
    returns -- therefore also SCSI reserve often is used to shut
    down
    possible disk access.)

     
  • Brian J. Watson

    Brian J. Watson - 2004-05-27
    • labels: 375977 -->
     
  • Nobody/Anonymous

    Logged In: NO

    We need to establish quorum in a cluster. However in a
    special case of 2-node cluster we need to fence the other
    node. Lets assume we already fence the other node, there
    can still be a problem.

    Since SSI also supports DRBD root failover, we also want to
    handle DRBD split-brain scenario. Node 2 takes over CLMS
    master and root fs. Node 1 is rebooting. Then node 2
    crashes before node1 recovers. Now both DRBD nodes think
    they have the most current data set. Node1 should check
    whether the root fs was taken over by another node and if
    not whether it was in sync with node2 and decide whether to
    fence itself or continue to take over the root fs.

    One way of doing this is storing a timestamp that indicates
    when it last took over root. When DRBD detects split-brain,
    they exchange info before changing from Secondary to Primary
    state and before mounting root. The node with the newest
    timestamp recovers the root fs.

    -Roger

     
  • Roger Tsang

    Roger Tsang - 2007-04-24
    • priority: 5 --> 6
     
  • Roger Tsang

    Roger Tsang - 2007-04-24

    Logged In: YES
    user_id=1246761
    Originator: NO

    Probably worth integrating heartbeat v2 from Linux-HA project with OpenSSI CLMS.

     

Log in to post a comment.