Menu

#68 make RSCT_mem_has_quorum() functional

Version_2.0
open
5
2005-08-24
2005-08-24
Jeff Li
No

This function in
http://cvs.sourceforge.net/viewcvs.py/evms/evms2/engine
/plugins/rsct/rsct_mem_info.c?view=markup is not really
working.

A way to see the quorum on command line is to issue

lsrsrc -dx IBM.PeerDomain OpState OpQuorumState |
grep "1:0:"

It has no quorum if the output is empty. It has quorum
otherwise. The only non-empty output is can have
is "1:0:" where 1 means that the domain is online and 0
means that it has quorum.

A simpler alternative is to do

lsrsrc -dx IBM.PeerDomain OpQuorumState | grep "0"

The output can be either empty or "0:" and the latter
means that it has quorum.

Discussion

  • Jeff Li

    Jeff Li - 2005-08-24

    Logged In: YES
    user_id=1330993

    This function's current implementation is the following:

    boolean
    RSCT_mem_has_quorum(){

    return TRUE;
    

    }

     
  • Guochun Shi

    Guochun Shi - 2005-08-24

    Logged In: YES
    user_id=712485

    It was mainly for two nodes and that's why quorum is always
    granted.

    RSCT group api does not provide quorum support:
    "
    Quorum
    Many applications require a form of quorum to ensure that the
    available before the application begins operation. For example,
    require a certain percentage of nodes to be up and running
    before
    another requires particular nodes.
    Because groups have significantly different requirements for
    quorum,
    does not provide a predefined quorum as part of its support. It
    of the application that is using the GSAPI to form groups that
    required quorum mechanisms. By manipulating the state
    information
    an application can build the required quorum mechanism.
    "

    I believe returning "total nodes == 2 || active nodes > half of
    all nodes" should be OK in our case.

     
  • Jeff Li

    Jeff Li - 2005-08-25

    Logged In: YES
    user_id=1330993

    INSTALL.RSCT talks about mkrpdomain command. This
    command can be used to pick a quorum mechanism. The
    man page says:

    -Q quorum_type | quorum_type_name
    Specifies the quorum rules that are used for start-
    up,
    operational, and configuration quorum. Start-up
    quorum
    defines how many nodes are contacted to obtain
    configuration information before starting the peer
    domain. Operational quorum defines how many
    nodes must be
    online in order to start and stop resources and how
    tie
    breaking is used. Configuration quorum defines
    how many
    nodes must be online to make changes to the peer
    domain
    (adding or removing a node, for example). To see
    what
    quorum rule types are available on a node, run:
    lsrsrc -c IBM.PeerDomain AvailableQuorumTypes

    The valid values are:

    0 | normal
    Specifies normal quorum rules. This is the
    default. For start-up quorum, at least half of
    the nodes will be contacted for configuration
    information. For configuration quorum, more
    than
    half of the nodes must be online to make
    configuration changes. For operational
    quorum,
    the cluster or subcluster must have a majority
    of the nodes in the peer domain. If a tie exists
    between subclusters, the subcluster that
    holds
    the tiebreaker has operational quorum.

    1 | quick
    Specifies quick quorum rules. For start-up
    quorum, even if no other nodes can be
    contacted,
    the node will still come online. For
    configuration quorum, more than half of the
    nodes must be online to make configuration
    changes. For operational quorum, the cluster
    or
    subcluster must have a majority of the nodes
    in
    the peer domain. If a tie exists between
    subclusters, the subcluster that holds the
    tiebreaker has operational quorum.

     
  • Jeff Li

    Jeff Li - 2005-08-25

    Logged In: YES
    user_id=1330993

    The man page is available on internet:

    http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
    topic=/com.ibm.cluster.rsct.doc/rsct_linux141/bl5trl0836.html

    More details are available at
    http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
    topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm0822.htm
    l:

    What is quorum?
    Quorum refers to the minimum numbers of nodes within the
    peer domain that are required to carry out a particular
    operation. There are three kinds of quorum that specify the
    number of nodes required for different types of operations.
    These are startup quorum, configuration quorum, and
    operational quorum.

    What is startup quorum?
    Startup quorum refers to the number of nodes needed to bring
    a peer domain online. If the configuration resource manager is
    unable to reach this minimum number of nodes, it will not be
    able to start the peer domain.

    What is configuration quorum?
    Configuration quorum refers to the minimum number of nodes,
    or a certain peer-domain state, needed to perform operations
    that modify the peer domain's configuration information. If you
    issue a command that will modify a peer domain's
    configuration, and the configuration resource manager is
    unable to reach this minimum number of nodes, the
    command will fail.

    What is operational quorum?
    Operation quorum refers to the minimum number of nodes, or
    a certain peer-domain state, needed to safely activate
    resources without creating conflicts with another subdomain.
    It is used to protect data following domain partitioning.

    What is domain partitioning?
    Domain partitioning is when a peer domain is inadvertently
    divided into two or more sub-domains.

    How does operational quorum help the configuration resource
    manager protect data following domain partitioning?
    Following domain partitioning when critical resources are
    active on nodes, the configuration resource manager needs to
    determine which sub-domain con continue operating and
    which other(s) should be dissolved. This is especially
    important when there are applications running on the domain
    that employ shared resource access. If the peer domain is
    partitioned, nodes in one sub-domain are no longer aware of
    nodes in any other sub-domain. Data corruption can occur if
    nodes in different sub-domains try to access the same
    shared resource. The configuration resource manager
    prevents this situation by deciding which sub-domain has
    operational quorum and can continue operating, thus
    becoming the peer domain. Usually, the sub-domain with the
    majority of nodes will have operational quorum.

    What is a tie breaker?
    After domain partitioning, it is usually the sub-domain with the
    majority of nodes will have operational quorum. However,
    sometimes there is a tie in which multiple sub-domains have
    exactly half of the defined nodes. A "tie" situation also occurs
    when exactly half the nodes of a domain are online, and the
    other half are inaccessible. When there is a tie, the
    configuration resource manager uses a tie breaker to
    determine which sub-domain has operational quorum. A tie
    breaker is an RMC resource defined by the configuration
    resource manager that specifies how tie situations should be
    resolve. It is the tie-breaker that determines which sub-
    domain will have operational quorum and so will survive, and
    which sub-domain will be dissolved.

    For more information, refer to Determining how the
    configuration resource manager will resolve tie situations
    when calculating operational quorum.

    What is a critical resource protection method?
    When a sub-domain that has critical resources loses quorum,
    the configuration resource manager uses a critical resource
    protection method on each node of the sub-domain to ensure
    that critical resources will not be corrupted. A critical
    resource protection method is simply software that determine
    how the configuration resource manger will respond when
    quorum is lost in a sub-domain. A critical resource protection
    method will also be used on a node whose configuration
    resource manager, group services, or topology services
    daemon hangs. There are a number of critical resource
    protection methods defined by the configuration resource
    manager. You can specify a critical resource protection
    method for the entire peer domain, or specify one to be used
    on just one particular node. The critical resource protection
    methods do such things as halt the system, reset and reboot
    the system, and so on.

    For more information, refer to Setting the critical resource
    protection method for a peer domain or a node in a peer
    domain.

    What are quorum types?
    A peer domain's quorum type specifies how startup quorum,
    configuration quorum, and operational quorum will be
    calculated for the peer domain. The quorum types are:

    Normal
    Normal mode which is the default for an AIX/Linux cluster. In
    this mode:
    StartupQuorum = N/2
    ConfigQuorum = N/2 + 1
    OpQuorum = Majority + TieBreaker

    Quick
    Quick startup mode, which is useful for large clusters. In this
    mode:
    StartupQuorum = 1
    ConfigQuorum = N/2 + 1
    OpQuorum = Majority + TieBreaker

    Override
    Override mode. Available only for OS/400 environments, and
    the default for such environments. In this mode:
    StartupQuorum = 1
    ConfigQuorum = 1
    OpQuorum is externally provided by RMC exploiter.

    SANFS
    SANFS mode. Available only for environments with the IBM
    TotalStorage SAN File System, and the default for such
    environments. In this mode:
    StartupQuorum = 1
    ConfigQuorum is externally provided by a designated group
    state value.
    OpQuorum = Majority + TieBreaker

    More advanced relevant topics (including what happens when
    exactly 1/2 of all nodes are available):

    http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
    topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm0835.htm
    l

     
  • Jeff Li

    Jeff Li - 2005-08-25

    Logged In: YES
    user_id=1330993

    It is also possible to pick a quorum mechanism with
    command startrpdomain, which is also in INSTALL.RSCT.

    The following details are from
    http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
    topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm0827.htm
    l:

    The peer domain's quorum type (as described in What are
    quorum types?) will determine the startup quorum needed for
    bringing the peer domain online. The cluster's quorum type
    will either be the default for your environment, or one you
    specified using the mkrpdomain command's -Q flag (as
    described in Step 2: create a new peer domain). When
    starting a peer domain, you can also, if the quorum type is
    set to 0 (Normal) or 1 (Quick), override the quorum type to
    specify a different one for calculating startup quorum. Using
    the startrpdomain command's -Q flag, you can specify the
    startup quorum type to be either:

    0 or "Normal"
    1 or "Quick"
    For example, if the quorum type is 0 (Normal), you could
    override that quorum type to specify that quick startup mode
    should be used to calculate startup quorum.

    startrpdomain -Q 1 ApplDomain
    or

    startrpdomain -Q Quick ApplDomain
    Notes:
    You cannot modify the startup quorum type if it has been
    implicitly set to 2 (Override) or 3 (SANFS).
    You cannot specify the startup quorum type to be 2 (Override)
    or 3 (SANFS).

     
  • Jeff Li

    Jeff Li - 2005-10-07

    Logged In: YES
    user_id=1330993

    Is there any plan or schedule to fix this bug? If not, what's the
    right way to get such a plan or schedule?

     

Log in to post a comment.

MongoDB Logo MongoDB