This function in
http://cvs.sourceforge.net/viewcvs.py/evms/evms2/engine
/plugins/rsct/rsct_mem_info.c?view=markup is not really
working.
A way to see the quorum on command line is to issue
lsrsrc -dx IBM.PeerDomain OpState OpQuorumState |
grep "1:0:"
It has no quorum if the output is empty. It has quorum
otherwise. The only non-empty output is can have
is "1:0:" where 1 means that the domain is online and 0
means that it has quorum.
A simpler alternative is to do
lsrsrc -dx IBM.PeerDomain OpQuorumState | grep "0"
The output can be either empty or "0:" and the latter
means that it has quorum.
Logged In: YES
user_id=1330993
This function's current implementation is the following:
boolean
RSCT_mem_has_quorum(){
}
Logged In: YES
user_id=712485
It was mainly for two nodes and that's why quorum is always
granted.
RSCT group api does not provide quorum support:
"
Quorum
Many applications require a form of quorum to ensure that the
available before the application begins operation. For example,
require a certain percentage of nodes to be up and running
before
another requires particular nodes.
Because groups have significantly different requirements for
quorum,
does not provide a predefined quorum as part of its support. It
of the application that is using the GSAPI to form groups that
required quorum mechanisms. By manipulating the state
information
an application can build the required quorum mechanism.
"
I believe returning "total nodes == 2 || active nodes > half of
all nodes" should be OK in our case.
Logged In: YES
user_id=1330993
INSTALL.RSCT talks about mkrpdomain command. This
command can be used to pick a quorum mechanism. The
man page says:
-Q quorum_type | quorum_type_name
Specifies the quorum rules that are used for start-
up,
operational, and configuration quorum. Start-up
quorum
defines how many nodes are contacted to obtain
configuration information before starting the peer
domain. Operational quorum defines how many
nodes must be
online in order to start and stop resources and how
tie
breaking is used. Configuration quorum defines
how many
nodes must be online to make changes to the peer
domain
(adding or removing a node, for example). To see
what
quorum rule types are available on a node, run:
lsrsrc -c IBM.PeerDomain AvailableQuorumTypes
The valid values are:
0 | normal
Specifies normal quorum rules. This is the
default. For start-up quorum, at least half of
the nodes will be contacted for configuration
information. For configuration quorum, more
than
half of the nodes must be online to make
configuration changes. For operational
quorum,
the cluster or subcluster must have a majority
of the nodes in the peer domain. If a tie exists
between subclusters, the subcluster that
holds
the tiebreaker has operational quorum.
1 | quick
Specifies quick quorum rules. For start-up
quorum, even if no other nodes can be
contacted,
the node will still come online. For
configuration quorum, more than half of the
nodes must be online to make configuration
changes. For operational quorum, the cluster
or
subcluster must have a majority of the nodes
in
the peer domain. If a tie exists between
subclusters, the subcluster that holds the
tiebreaker has operational quorum.
Logged In: YES
user_id=1330993
The man page is available on internet:
http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
topic=/com.ibm.cluster.rsct.doc/rsct_linux141/bl5trl0836.html
More details are available at
http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm0822.htm
l:
What is quorum?
Quorum refers to the minimum numbers of nodes within the
peer domain that are required to carry out a particular
operation. There are three kinds of quorum that specify the
number of nodes required for different types of operations.
These are startup quorum, configuration quorum, and
operational quorum.
What is startup quorum?
Startup quorum refers to the number of nodes needed to bring
a peer domain online. If the configuration resource manager is
unable to reach this minimum number of nodes, it will not be
able to start the peer domain.
What is configuration quorum?
Configuration quorum refers to the minimum number of nodes,
or a certain peer-domain state, needed to perform operations
that modify the peer domain's configuration information. If you
issue a command that will modify a peer domain's
configuration, and the configuration resource manager is
unable to reach this minimum number of nodes, the
command will fail.
What is operational quorum?
Operation quorum refers to the minimum number of nodes, or
a certain peer-domain state, needed to safely activate
resources without creating conflicts with another subdomain.
It is used to protect data following domain partitioning.
What is domain partitioning?
Domain partitioning is when a peer domain is inadvertently
divided into two or more sub-domains.
How does operational quorum help the configuration resource
manager protect data following domain partitioning?
Following domain partitioning when critical resources are
active on nodes, the configuration resource manager needs to
determine which sub-domain con continue operating and
which other(s) should be dissolved. This is especially
important when there are applications running on the domain
that employ shared resource access. If the peer domain is
partitioned, nodes in one sub-domain are no longer aware of
nodes in any other sub-domain. Data corruption can occur if
nodes in different sub-domains try to access the same
shared resource. The configuration resource manager
prevents this situation by deciding which sub-domain has
operational quorum and can continue operating, thus
becoming the peer domain. Usually, the sub-domain with the
majority of nodes will have operational quorum.
What is a tie breaker?
After domain partitioning, it is usually the sub-domain with the
majority of nodes will have operational quorum. However,
sometimes there is a tie in which multiple sub-domains have
exactly half of the defined nodes. A "tie" situation also occurs
when exactly half the nodes of a domain are online, and the
other half are inaccessible. When there is a tie, the
configuration resource manager uses a tie breaker to
determine which sub-domain has operational quorum. A tie
breaker is an RMC resource defined by the configuration
resource manager that specifies how tie situations should be
resolve. It is the tie-breaker that determines which sub-
domain will have operational quorum and so will survive, and
which sub-domain will be dissolved.
For more information, refer to Determining how the
configuration resource manager will resolve tie situations
when calculating operational quorum.
What is a critical resource protection method?
When a sub-domain that has critical resources loses quorum,
the configuration resource manager uses a critical resource
protection method on each node of the sub-domain to ensure
that critical resources will not be corrupted. A critical
resource protection method is simply software that determine
how the configuration resource manger will respond when
quorum is lost in a sub-domain. A critical resource protection
method will also be used on a node whose configuration
resource manager, group services, or topology services
daemon hangs. There are a number of critical resource
protection methods defined by the configuration resource
manager. You can specify a critical resource protection
method for the entire peer domain, or specify one to be used
on just one particular node. The critical resource protection
methods do such things as halt the system, reset and reboot
the system, and so on.
For more information, refer to Setting the critical resource
protection method for a peer domain or a node in a peer
domain.
What are quorum types?
A peer domain's quorum type specifies how startup quorum,
configuration quorum, and operational quorum will be
calculated for the peer domain. The quorum types are:
Normal
Normal mode which is the default for an AIX/Linux cluster. In
this mode:
StartupQuorum = N/2
ConfigQuorum = N/2 + 1
OpQuorum = Majority + TieBreaker
Quick
Quick startup mode, which is useful for large clusters. In this
mode:
StartupQuorum = 1
ConfigQuorum = N/2 + 1
OpQuorum = Majority + TieBreaker
Override
Override mode. Available only for OS/400 environments, and
the default for such environments. In this mode:
StartupQuorum = 1
ConfigQuorum = 1
OpQuorum is externally provided by RMC exploiter.
SANFS
SANFS mode. Available only for environments with the IBM
TotalStorage SAN File System, and the default for such
environments. In this mode:
StartupQuorum = 1
ConfigQuorum is externally provided by a designated group
state value.
OpQuorum = Majority + TieBreaker
More advanced relevant topics (including what happens when
exactly 1/2 of all nodes are available):
http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm0835.htm
l
Logged In: YES
user_id=1330993
It is also possible to pick a quorum mechanism with
command startrpdomain, which is also in INSTALL.RSCT.
The following details are from
http://publib.boulder.ibm.com/infocenter/clresctr/index.jsp?
topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm0827.htm
l:
The peer domain's quorum type (as described in What are
quorum types?) will determine the startup quorum needed for
bringing the peer domain online. The cluster's quorum type
will either be the default for your environment, or one you
specified using the mkrpdomain command's -Q flag (as
described in Step 2: create a new peer domain). When
starting a peer domain, you can also, if the quorum type is
set to 0 (Normal) or 1 (Quick), override the quorum type to
specify a different one for calculating startup quorum. Using
the startrpdomain command's -Q flag, you can specify the
startup quorum type to be either:
0 or "Normal"
1 or "Quick"
For example, if the quorum type is 0 (Normal), you could
override that quorum type to specify that quick startup mode
should be used to calculate startup quorum.
startrpdomain -Q 1 ApplDomain
or
startrpdomain -Q Quick ApplDomain
Notes:
You cannot modify the startup quorum type if it has been
implicitly set to 2 (Override) or 3 (SANFS).
You cannot specify the startup quorum type to be 2 (Override)
or 3 (SANFS).
Logged In: YES
user_id=1330993
Is there any plan or schedule to fix this bug? If not, what's the
right way to get such a plan or schedule?