The goal of this ticket is to address the following requirements.This ticket should be read in conjunction with ticket #79 (spare SCs) and #1170 (multiple standbys):
Deployment of large OpenSAF clusters in the cloud presents with the following challenges:
These requirements are being addressed in a phased manner.
(1) As a first step, https://sourceforge.net/p/opensaf/tickets/79/ - spares was implemented in 5.0. (And the headless cluster feature - multiple tickets)
(2) As a second step, implement (this ticket in 5.2) -
Enhanced OpenSAF cluster management such that there is always consensus (among the cluster nodes) on the
(3) As a last step implement https://sourceforge.net/p/opensaf/tickets/1170/ - multiple standbys in 5.3)
This ticket addresses bullet (2) above.
Requirements:
As a part of this ticket RAFT (see https://raft.github.io/) shall be used as the mechanism for
(a) achieving consensus among a set of the cluster nodes (and the membership changes)
(b) quorum based leader election
(c) split brain avoidance
The following deployment scenarios shall be supported when using RAFT:
-classic 2 SC OpenSAF cluster (or)
-when all nodes are SCs (2N + the rest are all spares) (or)
-2N + spare SCs (2N + a smaller subset are spares) (or)
-N-WAY (a active, the rest are all hot standbys) - 5.2
Note: A mix of hot standbys and spares should also be possible.
RAFT shall be added as a new OpenSAF service.
OpenSAF shall either implement RAFT or re-use existing RAFT implementations like etcd, etc.
A new topology service(TS) may be added which shall use the topology information (from TIPC) and MDS (in case of TCP) to determine cluster membership - https://sourceforge.net/p/opensaf/tickets/1892/.
CLM is the single layer that interfaces with the underlying RAFT and TS
All interactions to RAFT and TS shall be via the normalised cluster services adaptation interface called as OpenSAF cluster services library (CS). The CS library thereby shall enable OpenSAF to work with different implementations of RAFT. A plugin will be provided for a given implementation of RAFT.
CS and TS shall be added as libraries of OpenSAF CLM service.
(In the code structure, these shall be part of ....services/saf/clm/libcs and ....services/saf/clm/libts.
The name of the library shall be libOsafClusterServices.so)
OpenSAF should work both when RAFT is enabled or disabled on that system and should be backward compatible to previous OpenSAF releases!
The CS library shall provide a normalized set of APIs (and callback interfaces) such that OpenSAF can interact with different implementations of RAFT.
This ticket will implement the CS library and the associated plugin for a given implementation of RAFT.
The CS library API definitions to follow soon.
RAFT shall be used for enhanced SC active election and cluster membership. The following is the scope of this ticket:
(a) Implement RAFT and/or RAFT adaptation layer that provides interfaces for
Note: Yet to be seen if a leader yield interface is necessary
(b) an interface that alows invoking a fencing mechanism
(c) an interface that allows invoking an arbitration mechanism
Diff:
Diff:
Diff:
This provides a gist of how the OpenSAF startup would like like after introducing RAFT and CS/TS.

API and high level design details to follow soon.
Diff:
Diff:
Diff:
Diff:
Diff: