Menu

#2047 amf: SG unstable when NPI comp in PI SU moves to TERM_FAILED state during fresh assignments.

5.0.2
fixed
Praveen
None
defect
amf
nd
major
2016-10-07
2016-09-19
Praveen
No

Conf: 2N model, one NPI and one PI comp in SU.
Steps to reproduce:
1)Add application using immcfg command.
2)Lock SG.
3)Unlock-in and unlock SUs.
4)Make provisions so that instantiation and clean up scripts of NPI comp returns with non-zero status.
5)Unlock SG.
When SG is unlocked, AMFND initiates active assignments by issuing callback to PI comp and by instantiating NPI component. After instantiation failure of NPI comp, AMFND tries to clean up it. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state and terminates PI comp also, but AMFND neither responds to AMFD for the completion of assignment nor it sends any recovery request. Because of this SG remains unstable in REALIGN state.In this state, no admin operation is allowed.
Attached are traces and configuration.
Even though issue seems to be similar to #538, it is different in one aspect. In #538, SU moves to TERM_FAILED state and there is possibiltiy of failover/switchover as standby assignments are present.
In the present case, it happened during initial assignments and thus there is no standby to switchover/failover to.

1 Attachments

Related

Tickets: #2047
Wiki: ChangeLog-5.0.2
Wiki: ChangeLog-5.1.1

Discussion

  • Praveen

    Praveen - 2016-09-19
    • status: unassigned --> assigned
    • assigned_to: Praveen
     
  • Anders Widell

    Anders Widell - 2016-09-20
    • Milestone: 4.7.2 --> 5.0.2
     
  • Praveen

    Praveen - 2016-09-22

    Observing amfnd crash if a PI comp fails in CSI SET callback during fresh assignment.
    syslog:
    Sep 22 17:48:10 SC-1 root: INSTANTIATED : safComp=AmfDemo1,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1, PID :30566
    Sep 22 17:48:10 SC-1 osafamfnd[29154]: clc.cc:1170: avnd_comp_clc_st_chng_prc: Assertion 'csi' failed.
    Sep 22 17:48:10 SC-1 osafclmd[29132]: AL AMF Node Director is down, terminate this process
    Sep 22 17:48:10 SC-1 osafclmd[29132]: exiting for shutdown

    bt:
    (gdb) bt
    #0 0x00007fe6546d6cc9 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    #1 0x00007fe6546da0d8 in
    GI_abort () at abort.c:89
    #2 0x00007fe655835b2e in osafassert_fail (file=<optimized out="">, line=<optimized out="">,
    func=<optimized out="">, __assertion=<optimized out="">) at sysf_def.c:281
    #3 0x000000000040cf89 in avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x129a930,
    prv_st=prv_st@entry=SA_AMF_PRESENCE_TERMINATING, final_st=final_st@entry=SA_AMF_PRESENCE_UNINSTANTIATED)
    at clc.cc:1170
    #4 0x000000000040f590 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x129a930,
    ev=AVND_COMP_CLC_PRES_FSM_EV_TERM_SUCC) at clc.cc:906
    #5 0x000000000040fe4a in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7fe63c0008c0) at clc.cc:414
    #6 0x000000000042651f in avnd_evt_process (evt=0x7fe63c0008c0) at main.cc:626
    #7 avnd_main_process () at main.cc:577
    #8 0x0000000000405a73 in main (argc=2, argv=0x7ffdfdd98368) at main.cc:202

     
  • Praveen

    Praveen - 2016-09-27
    • status: assigned --> review
     
  • Praveen

    Praveen - 2016-10-07
    • status: review --> fixed
     
  • Praveen

    Praveen - 2016-10-07

    changeset: 8195:967e479b7c42
    branch: opensaf-5.0.x
    user: Praveen Malviya praveen.malviya@oracle.com
    date: Fri Oct 07 16:57:22 2016 +0530
    summary: amfnd: send recovery request to amfd for term-failed PI su [#2047] V2

    changeset: 8196:88f6b4d6e234
    branch: opensaf-5.1.x
    parent: 8193:78c53fa41138
    user: Praveen Malviya praveen.malviya@oracle.com
    date: Fri Oct 07 16:57:57 2016 +0530
    summary: amfnd: send recovery request to amfd for term-failed PI su [#2047] V2

    changeset: 8197:69f86b9bddb0
    tag: tip
    parent: 8192:c219f1b059cb
    user: Praveen Malviya praveen.malviya@oracle.com
    date: Fri Oct 07 16:58:07 2016 +0530
    summary: amfnd: send recovery request to amfd for term-failed PI su [#2047] V2

     

    Related

    Tickets: #2047


Log in to post a comment.