Migrated from http://devel.opensaf.org/ticket/1726
NOTE: This issue is similar but different from the issue reported in #1725.
When a component is restarted via the COMPONENT_RESTART recovery policy and restart is DISABLED for the component (OR the recommended recovery action for the component is COMPONENT_FAILOVER and an error is detected for the component), there are two AMF issues observed:
Changes in member component presence states should always trigger a re-calculation of the SU's presence state and in this case should have led to the SU's presence state changing to uninstantiated.
The following are the state transitions from the AMFD trace file for an SU and component that demonstrate these issues (ignore the gaps in the timestamps):
Jan 28 11:42:05.809255 osafamfd [3687:avd_su.c:0660] >> avd_su_pres_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATING => INSTANTIATED
Jan 28 11:43:26.635981 osafamfd [3687:avd_comp.c:0103] >> avd_comp_oper_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' ENABLED => DISABLED
Jan 28 11:43:26.637787 osafamfd [3687:avd_comp.c:0119] >> avd_comp_readiness_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' IN_SERVICE => OUT_OF_SERVICE
Jan 28 11:43:26.638326 osafamfd [3687:avd_su.c:0688] >> avd_su_oper_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' ENABLED => DISABLED
Jan 28 11:43:26.639225 osafamfd [3687:avd_su.c:0714] >> avd_su_readiness_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' OUT_OF_SERVICE
Jan 28 11:43:26.641025 osafamfd [3687:avd_comp.c:0103] >> avd_comp_oper_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' DISABLED => DISABLED
Jan 28 11:43:26.641994 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATED => TERMINATING
Jan 28 11:43:26.642393 osafamfd [3687:avd_su.c:0660] >> avd_su_pres_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATED => TERMINATING
Jan 28 11:43:41.699869 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' TERMINATING => UNINSTANTIATED
Jan 28 11:43:41.702213 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' UNINSTANTIATED => INSTANTIATING
Jan 28 11:43:56.745597 osafamfd [3687:avd_comp.c:0103] >> avd_comp_oper_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' DISABLED => ENABLED
Jan 28 11:43:56.746225 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATING => INSTANTIATED
Jan 28 11:43:56.746672 osafamfd [3687:avd_su.c:0688] >> avd_su_oper_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' DISABLED => ENABLED
Jan 28 11:43:56.747567 osafamfd [3687:avd_su.c:0714] >> avd_su_readiness_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' IN_SERVICE
Jan 28 11:43:56.747912 osafamfd [3687:avd_comp.c:0119] >> avd_comp_readiness_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' OUT_OF_SERVICE => IN_SERVICE
Jan 28 11:43:56.750596 osafamfd [3687:avd_su.c:0660] >> avd_su_pres_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' TERMINATING => INSTANTIATED
Reproduction procedure:
0. Enable AMFD trace output
1. Startup cluster with 2N AMF sample configured (one component per SU) after changing the saAmfCtDefDisableRestart attribute to 1 or the saAmfCtDefRecoveryOnError attribute to 3.
2. Start the ntfsubscribe utility
3. Once AMF components are started and assigned workload, kill one of the AMF component processes
4. Note that there is no SU presence state change notification generated for the transition to the uninstantiated state.
5. Review the AMFD log and note the transitions of the SU are not as expected.
Changed 2 years ago by nagendra ¶
■milestone set to 4.2.GA
SU FSM need to be changed and tested for single component getting restarted and making SU to restart. Will require good amout of effort. So, pushing it into next release.
Spec reference and clarification:
Table 5 on page 74 B.04.01 spec shows the possible presence states of the components of a service unit for each valid presence state of the service unit.
According to this table, when a SU is in uninstantiated stae all of its components
should be in uninstantiated state. Besides the lock/lock-in operations there can be two more situations, because of component failover recovery, where a SU can be in uninstantiated state:
1) If a SU contains a single component, and due to the fault
on component it enters into the UNINSTANTIATED state, AMF should mark the SU as uninstantiated.
2) If a SU contains a multiple components, and due to the faults
on components they enter into UNINSTANTIATED state, AMF should mark the SU as uninstantiated.
changeset: 5559:9208fa713101
tag: tip
user: Praveenpraveen.malviya@oracle.com
date: Tue Aug 12 20:26:21 2014 +0530
summary: amfnd : mark SU UNINSTANTIATED if all comps are UNINSTANTIATED during compfailover [#359]
[staging:9208fa]
Related
Tickets:
#359changeset: 5618:8233b065c0a9
branch: opensaf-4.3.x
parent: 5616:afb35b4af8c5
user: praveen.malviya@oracle.com
date: Wed Aug 20 14:20:41 2014 +0530
summary: amfnd : mark SU UNINSTANTIATED if all comps are UNINSTANTIATED during compfailover [#359]
changeset: 5619:d828d1a414aa
branch: opensaf-4.4.x
parent: 5617:80d69568d9f7
user: praveen.malviya@oracle.com
date: Wed Aug 20 14:20:53 2014 +0530
summary: amfnd : mark SU UNINSTANTIATED if all comps are UNINSTANTIATED during compfailover [#359]
Related
Tickets:
#359