Menu

#359 AMF issues when re-instantiating SU/component

4.3.3
fixed
Praveen
None
defect
amf
-
4.1.B2
major
2014-08-20
2013-05-31
No

Migrated from http://devel.opensaf.org/ticket/1726

NOTE: This issue is similar but different from the issue reported in #1725.

When a component is restarted via the COMPONENT_RESTART recovery policy and restart is DISABLED for the component (OR the recommended recovery action for the component is COMPONENT_FAILOVER and an error is detected for the component), there are two AMF issues observed:

  1. SU does not enter uninstantiated presence state when its single member component enters the uninstantiated presence state while the component is being re-instantiated.

Changes in member component presence states should always trigger a re-calculation of the SU's presence state and in this case should have led to the SU's presence state changing to uninstantiated.

  1. A side effect of the SU not transitioning to the uninstantiated presence state is that the expected SU presence state change notification for this transition is not getting generated.

The following are the state transitions from the AMFD trace file for an SU and component that demonstrate these issues (ignore the gaps in the timestamps):

Jan 28 11:42:05.809255 osafamfd [3687:avd_su.c:0660] >> avd_su_pres_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATING => INSTANTIATED
Jan 28 11:43:26.635981 osafamfd [3687:avd_comp.c:0103] >> avd_comp_oper_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' ENABLED => DISABLED
Jan 28 11:43:26.637787 osafamfd [3687:avd_comp.c:0119] >> avd_comp_readiness_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' IN_SERVICE => OUT_OF_SERVICE
Jan 28 11:43:26.638326 osafamfd [3687:avd_su.c:0688] >> avd_su_oper_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' ENABLED => DISABLED
Jan 28 11:43:26.639225 osafamfd [3687:avd_su.c:0714] >> avd_su_readiness_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' OUT_OF_SERVICE
Jan 28 11:43:26.641025 osafamfd [3687:avd_comp.c:0103] >> avd_comp_oper_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' DISABLED => DISABLED
Jan 28 11:43:26.641994 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATED => TERMINATING
Jan 28 11:43:26.642393 osafamfd [3687:avd_su.c:0660] >> avd_su_pres_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATED => TERMINATING
Jan 28 11:43:41.699869 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' TERMINATING => UNINSTANTIATED
Jan 28 11:43:41.702213 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' UNINSTANTIATED => INSTANTIATING
Jan 28 11:43:56.745597 osafamfd [3687:avd_comp.c:0103] >> avd_comp_oper_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' DISABLED => ENABLED
Jan 28 11:43:56.746225 osafamfd [3687:avd_comp.c:0085] >> avd_comp_pres_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' INSTANTIATING => INSTANTIATED
Jan 28 11:43:56.746672 osafamfd [3687:avd_su.c:0688] >> avd_su_oper_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' DISABLED => ENABLED
Jan 28 11:43:56.747567 osafamfd [3687:avd_su.c:0714] >> avd_su_readiness_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' IN_SERVICE
Jan 28 11:43:56.747912 osafamfd [3687:avd_comp.c:0119] >> avd_comp_readiness_state_set: 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' OUT_OF_SERVICE => IN_SERVICE
Jan 28 11:43:56.750596 osafamfd [3687:avd_su.c:0660] >> avd_su_pres_state_set: 'safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?1' TERMINATING => INSTANTIATED

Reproduction procedure:
0. Enable AMFD trace output
1. Startup cluster with 2N AMF sample configured (one component per SU) after changing the saAmfCtDefDisableRestart attribute to 1 or the saAmfCtDefRecoveryOnError attribute to 3.
2. Start the ntfsubscribe utility
3. Once AMF components are started and assigned workload, kill one of the AMF component processes
4. Note that there is no SU presence state change notification generated for the transition to the uninstantiated state.
5. Review the AMFD log and note the transitions of the SU are not as expected.

Related

Tickets: #359
Wiki: ChangeLog-4.3.3
Wiki: ChangeLog-4.4.1

Discussion

  • Nagendra Kumar

    Nagendra Kumar - 2013-05-31

    Changed 2 years ago by nagendra ¶
    ■milestone set to 4.2.GA
    SU FSM need to be changed and tested for single component getting restarted and making SU to restart. Will require good amout of effort. So, pushing it into next release.

     
  • Praveen

    Praveen - 2014-06-16
    • status: unassigned --> assigned
    • assigned_to: Praveen
    • Milestone: future --> 4.3.3
     
  • Praveen

    Praveen - 2014-07-01

    Spec reference and clarification:

    Table 5 on page 74 B.04.01 spec shows the possible presence states of the components of a service unit for each valid presence state of the service unit.
    According to this table, when a SU is in uninstantiated stae all of its components
    should be in uninstantiated state. Besides the lock/lock-in operations there can be two more situations, because of component failover recovery, where a SU can be in uninstantiated state:

    1) If a SU contains a single component, and due to the fault
    on component it enters into the UNINSTANTIATED state, AMF should mark the SU as uninstantiated.

    2) If a SU contains a multiple components, and due to the faults
    on components they enter into UNINSTANTIATED state, AMF should mark the SU as uninstantiated.

     
  • Praveen

    Praveen - 2014-07-15
    • status: assigned --> review
     
  • Nagendra Kumar

    Nagendra Kumar - 2014-08-12

    changeset: 5559:9208fa713101
    tag: tip
    user: Praveenpraveen.malviya@oracle.com
    date: Tue Aug 12 20:26:21 2014 +0530
    summary: amfnd : mark SU UNINSTANTIATED if all comps are UNINSTANTIATED during compfailover [#359]

    [staging:9208fa]

     

    Related

    Tickets: #359

  • Praveen

    Praveen - 2014-08-20
    • status: review --> fixed
     
  • Praveen

    Praveen - 2014-08-20

    changeset: 5618:8233b065c0a9
    branch: opensaf-4.3.x
    parent: 5616:afb35b4af8c5
    user: praveen.malviya@oracle.com
    date: Wed Aug 20 14:20:41 2014 +0530
    summary: amfnd : mark SU UNINSTANTIATED if all comps are UNINSTANTIATED during compfailover [#359]

    changeset: 5619:d828d1a414aa
    branch: opensaf-4.4.x
    parent: 5617:80d69568d9f7
    user: praveen.malviya@oracle.com
    date: Wed Aug 20 14:20:53 2014 +0530
    summary: amfnd : mark SU UNINSTANTIATED if all comps are UNINSTANTIATED during compfailover [#359]

     

    Related

    Tickets: #359


Log in to post a comment.