Menu

#439 Enhanced cluster management using quorum

5.17.08
accepted
#79 (2) #1170 (2)
enhancement
clm
-
major
2017-04-10
2013-05-31
No

The goal of this ticket is to address the following requirements.This ticket should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple standbys):

Deployment of large OpenSAF clusters in the cloud presents with the following challenges:

  • Multiple nodes failing/faulting simultaneously (either in a cattle class deployment OR the host machine going down which inturn will pull down the guest VM nodes)
  • Relying on 3rd party OR less reliable - hardware/network/hosts
  • Dynamically changing cluster membership due to scale-out and scale-in operations
  • Multiple (or all) nodes can now become system controller nodes. This increases the probability of split brain and cluster partitioning.

These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ - spares was implemented in 5.0. (And the headless cluster feature - multiple tickets)

(2) As a second step, implement (this ticket in 5.2) -
Enhanced OpenSAF cluster management such that there is always consensus (among the cluster nodes) on the

  • current cluster members
  • the current active SC, leader election
  • the order of member nodes joining/leaving the cluster

(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ - multiple standbys in 5.3)

This ticket addresses bullet (2) above.

Requirements:

  • As a part of this ticket RAFT (see https://raft.github.io/) shall be used as the mechanism for
    (a) achieving consensus among a set of the cluster nodes (and the membership changes)
    (b) quorum based leader election
    (c) split brain avoidance
    The following deployment scenarios shall be supported when using RAFT:
    -classic 2 SC OpenSAF cluster (or)
    -when all nodes are SCs (2N + the rest are all spares) (or)
    -2N + spare SCs (2N + a smaller subset are spares) (or)
    -N-WAY (a active, the rest are all hot standbys) - 5.2
    Note: A mix of hot standbys and spares should also be possible.

  • RAFT shall be added as a new OpenSAF service.

  • OpenSAF shall either implement RAFT or re-use existing RAFT implementations like etcd, etc.

  • A new topology service(TS) may be added which shall use the topology information (from TIPC) and MDS (in case of TCP) to determine cluster membership - https://sourceforge.net/p/opensaf/tickets/1892/.

  • CLM is the single layer that interfaces with the underlying RAFT and TS

  • All interactions to RAFT and TS shall be via the normalised cluster services adaptation interface called as OpenSAF cluster services library (CS). The CS library thereby shall enable OpenSAF to work with different implementations of RAFT. A plugin will be provided for a given implementation of RAFT.

  • CS and TS shall be added as libraries of OpenSAF CLM service.
    (In the code structure, these shall be part of ....services/saf/clm/libcs and ....services/saf/clm/libts.
    The name of the library shall be libOsafClusterServices.so)

  • OpenSAF should work both when RAFT is enabled or disabled on that system and should be backward compatible to previous OpenSAF releases!

The CS library shall provide a normalized set of APIs (and callback interfaces) such that OpenSAF can interact with different implementations of RAFT.

This ticket will implement the CS library and the associated plugin for a given implementation of RAFT.

The CS library API definitions to follow soon.

Discussion

  • Mathi Naickan

    Mathi Naickan - 2013-11-10
    • Milestone: future --> 4.5.FC
     
  • Mathi Naickan

    Mathi Naickan - 2014-07-21
    • Milestone: 4.5.FC --> 4.6.FC
     
  • Mathi Naickan

    Mathi Naickan - 2014-10-15
    • Milestone: 4.6.FC --> 5.0
     
  • Mathi Naickan

    Mathi Naickan - 2015-11-25

    RAFT shall be used for enhanced SC active election and cluster membership. The following is the scope of this ticket:
    (a) Implement RAFT and/or RAFT adaptation layer that provides interfaces for

    • adding/removing nodes to the cluster membership
    • querying leader
    • callbacks notifying about new leader
    • read/write interface
      Note: Yet to be seen if a leader yield interface is necessary

    (b) an interface that alows invoking a fencing mechanism
    (c) an interface that allows invoking an arbitration mechanism

     
  • Mathi Naickan

    Mathi Naickan - 2015-11-25
    • labels: --> #79, #1170
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,22 +1,18 @@
    -Cloned from http://devel.opensaf.org/ticket/1766
    +The purpose of this enhancement is to propose enhancing the OpenSAF cluster management/membership approach to:
    +- Perform high level node monitoring(heartbeating)
    +- Enhanced split-brain avoidance techniques.
    +RAFT is being considered for implementing the above cluster management enhancements.
    +. 
    +The scope of this ticket includes the following:
    
    -Some text in the original ticket description is questionable. But, migrating this ticket for reference. This ticket has also to be read in parallel with https://sourceforge.net/p/opensaf/tickets/220/.
    +(a) Implement RAFT and/or RAFT adaptation layer that provides interfaces for
    +- adding/removing nodes to the cluster membership
    +- querying leader
    +- callbacks notifying about new leader
    +- read/write interface
    +- notification of nodes joining/leaving the cluster membership
    +Note: Yet to be seen if a leader yield interface is necessary
    
    +(b) an interface that alows invoking a fencing mechanism
    
    -
    -The current OpenSAF cluster management takes a fairly minimalistic approach in that it only requires that a node have connectivity to the current active controller and there is no active health monitoring performed of the current member cluster nodes.
    -The purpose of this enhancement is to propose enhancing the OpenSAF cluster management/membership approach to:
    -- Ensure a prospective node has network connectivity to all current cluster members
    -- Perform minimal health monitoring of cluster node members to detect fault conditions not currently detected with today's approach
    -- Better handle split-brain conditions
    -
    -
    -Changed 2 years ago by dfick
    -
    -
    -owner changed from murthy to mathi
    -component changed from opensaf to saf/clmsv
    -Changed 2 years ago by mathi
    -
    -
    -Full mesh connectivity determination and quorum based indication for enhanced cluster 'management' should either come in as a new service/layer and/or eventually as an enhanced NODE_UP/NODE_DOWN message into CLM, thus keeping intact the current mechanism to feed CLM with connection evidences.
    +(c) an interface that allows invoking an arbitration mechanism
    
    • status: unassigned --> assigned
     
  • Mathi Naickan

    Mathi Naickan - 2015-11-25
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,5 @@
    -The purpose of this enhancement is to propose enhancing the OpenSAF cluster management/membership approach to:
    +The purpose of this ticket is to achieve the following enhancements to OpenSAF cluster management/membership:
    +
    
     - Perform high level node monitoring(heartbeating)
     - Enhanced split-brain avoidance techniques.
     RAFT is being considered for implementing the above cluster management enhancements.
    
     
  • Mathi Naickan

    Mathi Naickan - 2016-04-11
    • status: assigned --> accepted
    • Milestone: 5.0.FC --> 5.1.FC
     
  • Mathi Naickan

    Mathi Naickan - 2016-05-31
    • summary: Enhanced cluster management --> Enhanced cluster management using RAFT
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,19 +1,54 @@
    -The purpose of this ticket is to achieve the following enhancements to OpenSAF cluster management/membership:
    +The goal of this ticket is to address the following requirements.This ticket should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple standbys):
    
    -- Perform high level node monitoring(heartbeating)
    -- Enhanced split-brain avoidance techniques.
    -RAFT is being considered for implementing the above cluster management enhancements.
    -. 
    -The scope of this ticket includes the following:
    +Deployment of large OpenSAF clusters in the cloud presents with the following challenges:
    +- Multiple nodes failing/faulting simultaneously (either in a cattle class deployment OR the host machine going down which inturn will pull down the guest VM nodes)
    +- Relying on 3rd party OR less reliable - hardware/network/hosts
    +- Dynamically changing cluster membership due to scale-out and scale-in operations
    +- Multiple (or all) nodes can now become system controller nodes. This increases the probability of split brain and cluster partitioning.
    
    -(a) Implement RAFT and/or RAFT adaptation layer that provides interfaces for
    -- adding/removing nodes to the cluster membership
    -- querying leader
    -- callbacks notifying about new leader
    -- read/write interface
    -- notification of nodes joining/leaving the cluster membership
    -Note: Yet to be seen if a leader yield interface is necessary
    +These requirements are being addressed in a phased manner.
    +(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was implemented in 5.0. (And the headless cluster feature)
    
    -(b) an interface that alows invoking a fencing mechanism
    +(2) As a second step, implement (this ticket in 5.1)  - 
    +Enhanced OpenSAF cluster management such that there is always consensus (among the cluster nodes) on the 
    +- current cluster members
    +- the current active SC, leader election
    +- the order of member nodes joining/leaving the cluster
    
    -(c) an interface that allows invoking an arbitration mechanism
    +
    +(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 5.2?)
    +
    +
    +This ticket addresses bullet (2) above.
    +
    +Requirements:
    +
    +* As a part of this ticket RAFT (see https://raft.github.io/) shall be used as the mechanism for 
    +(a) achieving consensus among a set of the cluster nodes (and the membership changes)
    +(b) quorum based leader election
    +(c) split brain avoidance
    +The following deployment scenarios shall be supported when using RAFT:
    +- classic 2 SC OpenSAF cluster (or)
    +- when all nodes are SCs (2N + the rest are all spares) (or)
    +- 2N + spare SCs (2N + a smaller subset are spares) (or)
    +- N-WAY (a active, the rest are all hot standbys) - 5.2
    +Note: A mix of hot standbys and spares should also be possible.
    +
    +
    +* RAFT shall be a added as a new OpenSAF service. 
    +
    +* OpenSAF shall either implement RAFT or re-use existing RAFT implementations like logcabin or etcd, etc.
    +
    +* A new topology service(TS) *may* be added which shall use the topology information (from TIPC) and MDS (in case of TCP) to determine cluster membership
    +
    +* CLM is the single layer that interfaces with the underlying RAFT and TS
    +
    +* All interactions to RAFT and TS shall be via the normalised cluster services adaptation interface called as OpenSAF cluster services library (CS).  The CS library thereby shall enable OpenSAF to work with different implementations of RAFT.
    +
    +* CS and TS shall be added as libraries of OpenSAF CLM service. 
    +(In the code structure, these shall be part of ....services/saf/clm/libcs and ....services/saf/clm/libts.
    +The name of the library shall be libOsafClusterServices.so)
    +
    +The CS library shall provide a normalized set of APIs (and callback interfaces) such that OpenSAF can interact with different implementations of RAFT. 
    +
    +API and High level design details to follow:
    
     
  • Mathi Naickan

    Mathi Naickan - 2016-05-31

    This provides a gist of how the OpenSAF startup would like like after introducing RAFT and CS/TS.

     
  • Mathi Naickan

    Mathi Naickan - 2016-05-31
     
  • Mathi Naickan

    Mathi Naickan - 2016-05-31

    API and high level design details to follow soon.

     
  • Mathi Naickan

    Mathi Naickan - 2016-05-31
    • summary: Enhanced cluster management using RAFT --> Enhanced cluster management using RAFT consensus algorithm
     
  • Mathi Naickan

    Mathi Naickan - 2016-05-31
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -28,10 +28,10 @@
     (b) quorum based leader election
     (c) split brain avoidance
     The following deployment scenarios shall be supported when using RAFT:
    -- classic 2 SC OpenSAF cluster (or)
    -- when all nodes are SCs (2N + the rest are all spares) (or)
    -- 2N + spare SCs (2N + a smaller subset are spares) (or)
    -- N-WAY (a active, the rest are all hot standbys) - 5.2
    +-classic 2 SC OpenSAF cluster (or)
    +-when all nodes are SCs (2N + the rest are all spares) (or)
    +-2N + spare SCs (2N + a smaller subset are spares) (or)
    +-N-WAY (a active, the rest are all hot standbys) - 5.2
     Note: A mix of hot standbys and spares should also be possible.
    
     
  • Mathi Naickan

    Mathi Naickan - 2016-05-31
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -49,6 +49,8 @@
     (In the code structure, these shall be part of ....services/saf/clm/libcs and ....services/saf/clm/libts.
     The name of the library shall be libOsafClusterServices.so)
    
    +* OpenSAF should work both when RAFT is enabled or disabled on that system and should be backward compatible to previous OpenSAF releases!
    +
     The CS library shall provide a normalized set of APIs (and callback interfaces) such that OpenSAF can interact with different implementations of RAFT. 
    
     API and High level design details to follow:
    
     
  • Mathi Naickan

    Mathi Naickan - 2016-06-20
    • summary: Enhanced cluster management using RAFT consensus algorithm --> Enhanced cluster management using quorum
     
  • Mathi Naickan

    Mathi Naickan - 2016-06-22
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -7,17 +7,16 @@
    
     - Multiple (or all) nodes can now become system controller nodes. This increases the probability of split brain and cluster partitioning.
    
     These requirements are being addressed in a phased manner.
    -(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ was implemented in 5.0. (And the headless cluster feature)
    +(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ - spares was implemented in 5.0. (And the headless cluster feature - multiple tickets)
    
    -(2) As a second step, implement (this ticket in 5.1)  - 
    +(2) As a second step, implement (this ticket in 5.2)  - 
     Enhanced OpenSAF cluster management such that there is always consensus (among the cluster nodes) on the 
    
     - current cluster members
     - the current active SC, leader election
     - the order of member nodes joining/leaving the cluster
    
    
    -(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ in 5.2?)
    -
    +(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ - multiple standbys in 5.3)
    
     This ticket addresses bullet (2) above.
    
    @@ -37,7 +36,7 @@
    
    
     * RAFT shall be a added as a new OpenSAF service. 
    
    -* OpenSAF shall either implement RAFT or re-use existing RAFT implementations like logcabin or etcd, etc.
    +* OpenSAF shall either implement RAFT or re-use existing RAFT implementations like etcd, etc.
    
    
     * A new topology service(TS) *may* be added which shall use the topology information (from TIPC) and MDS (in case of TCP) to determine cluster membership
    
     
  • Mathi Naickan

    Mathi Naickan - 2016-06-22
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -34,15 +34,15 @@
     Note: A mix of hot standbys and spares should also be possible.
    
    
    -* RAFT shall be a added as a new OpenSAF service. 
    +* RAFT shall be added as a new OpenSAF service.
    
    -* OpenSAF shall either implement RAFT or re-use existing RAFT implementations like etcd, etc.
    +* OpenSAF shall either implement RAFT or re-use existing RAFT implementations like etcd, etc. 
    
    
     * A new topology service(TS) *may* be added which shall use the topology information (from TIPC) and MDS (in case of TCP) to determine cluster membership
    
    
     * CLM is the single layer that interfaces with the underlying RAFT and TS
    
    -* All interactions to RAFT and TS shall be via the normalised cluster services adaptation interface called as OpenSAF cluster services library (CS).  The CS library thereby shall enable OpenSAF to work with different implementations of RAFT.
    +* All interactions to RAFT and TS shall be via the normalised cluster services adaptation interface called as OpenSAF cluster services library (CS).  The CS library thereby shall enable OpenSAF to work with different implementations of RAFT. A plugin will be provided for a given implementation of RAFT.
    
    
     * CS and TS shall be added as libraries of OpenSAF CLM service. 
     (In the code structure, these shall be part of ....services/saf/clm/libcs and ....services/saf/clm/libts.
    @@ -52,4 +52,6 @@
    
     The CS library shall provide a normalized set of APIs (and callback interfaces) such that OpenSAF can interact with different implementations of RAFT. 
    
    -API and High level design details to follow:
    +This ticket will implement the CS library and the associated plugin for a given implementation of RAFT.
    +
    +The CS library API definitions to follow soon. 
    
     
  • Mathi Naickan

    Mathi Naickan - 2016-06-22
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -38,7 +38,7 @@
    
    
     * OpenSAF shall either implement RAFT or re-use existing RAFT implementations like etcd, etc. 
    
    -* A new topology service(TS) *may* be added which shall use the topology information (from TIPC) and MDS (in case of TCP) to determine cluster membership
    +* A new topology service(TS) *may* be added which shall use the topology information (from TIPC) and MDS (in case of TCP) to determine cluster membership - https://sourceforge.net/p/opensaf/tickets/1892/.
    
    
     * CLM is the single layer that interfaces with the underlying RAFT and TS
    
     
  • Mathi Naickan

    Mathi Naickan - 2016-08-29
    • Milestone: 5.1.FC --> 5.2.FC
     
  • Anders Widell

    Anders Widell - 2017-02-24
    • Milestone: 5.2.FC --> next
     

Log in to post a comment.

MongoDB Logo MongoDB