Migrated from http://devel.opensaf.org/ticket/1734
There are a number of issues with how AMF is setting some of the parameters of CSI set callbacks or the sequencing of said callbacks, especially as it relates to the standbyRank field when making standby HA state assignments. Briefly the intent of the standbyRank field is to indicate the current rank of the component to which the standby HA state assignment is being made. Note that this is not the configured rank of the containing SU (which is what is being used in protection group track callbacks) but instead the rank of the component amongst all standby components for the CSI. So in the case of 2N and N+M redundancy models, the standbyRank field should always be one since there is exactly one assigned standby component for each CSI. However for the N-way redundancy model, the standby rank should be one or higher indicating the relative rank of the standby. So if for example the assignments of CSI1 were made in the following order then the standbyRank would expected to be as shown:
And if Comp3 were to be failed over then Comp4's standby rank would be elevated to 2.
A detailed breakdown of the issues observed are (redundancy models under which issue was observed in parentheses):
Should always be 1 for 2N and N+M, should be 1 or higher indicating the standby rank for the component under the N-way RM. See the following files from the attached tarball which show this issue:
2N: 2N_comp2.log
N+M: NPM_comp3.log
N-way: Nway_comp*.log
Active component name for standby HA state assignment is incorrect sometimes (N-way). See the following files for examples of this issue (actual assignments listed in NwayAssignments?.txt):
Nway_comp1.log: Active component DN for the AmfDemo?2 SI should be safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?1 rather than safComp=AmfDemo?,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo?1.
Nway_comp2.log: Active component DN for the AmfDemo?1 SI should be safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1 rather than safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?1.
AMF is making multiple standby CSI assignments for the same CSI at the same time (N-way)
Specifically for the N-way redundancy model, the standby assignments for a particular SI should be performed sequentially since each standby should have a unique rank value and AMF cannot know what the actual standby rank is for a particular SU until the next highest ranked standby SU has already accepted its standby HA state assignment for the SI/CSI(s).
This should lead to a corresponding PG tracking callback which reflects the change as a STATE_CHANGE for the lower ranked standby component(s).
NOTE: All files referenced are contained in the attached tarball file.
Reproduction procedure
1. Copy the modified AMF component template source code file to the avsv sample directory and rebuild the application
2. Use the appropriate information model XML file from the tarball based on the RM in question:
- 2N: imm.xml.amf_demo_1Node
- N+M: imm.xml.amf_demo_1Node-NPM
- N-way: imm.xml.amf_demo_1Node-Nway
NOTE: You will need to update the node names to match your test setup.
3. Start controller node
4. Once assignments have made for the AmfDemo? applicatoin, observe the described issues in the produced AMF component log files in /var/opt/amf_demo
5. (Failover case) Kill one of the component processes.
■attachment issueArtifacts-20110204.tgz added
Source code, information model XML, and component log files related to the described issues.
issueArtifacts-20110204
As part of ticket cleanup, evaluated and decided to keep it as enhancement.