steps to reproduce:
1)Bring one controller up.
2)Add attached configuration in the system.
3)Unlock-in and unlock su1.
Attached configuration uses amfpm command to start active monitoring. If this command is wrongly configured by the user, AMF reports fault on the component and AMFND restarts it. Since everytime active monitoring command fails, component is getting continuously faulted. As a last option when OpenSAF is stopped on the node, AMFND asserted:
syslog:
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed 'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Removed assignments from AMF components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation timer expired
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO Terminating all AMF components
Jun 13 12:27:03 SC-1 osafamfnd[30287]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => TERMINATING
Jun 13 12:27:03 SC-1 osafamfnd[30287]: src/amf/amfnd/susm.cc:1886: avnd_su_pres_st_chng_prc: Assertion 'si' failed.
Jun 13 12:27:03 SC-1 osafclmd[30264]: AL AMF Node Director is down, terminate this process
bt:
#0 0x00007f662fbe8cc9 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f662fbec0d8 in GI_abort () at abort.c:89
#2 0x00007f66306dedbe in osafassert_fail (file=<optimized out="">, line=<optimized out="">, func=<optimized out="">,
__assertion=<optimized out="">) at src/base/sysf_def.c:286
#3 0x00007f66313fff3f in avnd_su_pres_st_chng_prc (final_st=SA_AMF_PRESENCE_TERMINATING,
prv_st=SA_AMF_PRESENCE_RESTARTING, su=0x7f66324d33c0, cb=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/susm.cc:1886
#4 avnd_su_pres_fsm_run (cb=cb@entry=0x7f663161f240 <_avnd_cb>, su=0x7f66324d33c0, comp=comp@entry=0x7f66324d46b0,
ev=<optimized out="">) at src/amf/amfnd/susm.cc:1610
#5 0x00007f66313caf58 in avnd_comp_clc_st_chng_prc (cb=cb@entry=0x7f663161f240 <_avnd_cb>,
comp=comp@entry=0x7f66324d46b0, prv_st=prv_st@entry=SA_AMF_PRESENCE_RESTARTING,
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATING) at src/amf/amfnd/clc.cc:1501
#6 0x00007f66313cf127 in avnd_comp_clc_fsm_run (cb=0x7f663161f240 <_avnd_cb>, comp=comp@entry=0x7f66324d46b0,
ev=ev@entry=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP) at src/amf/amfnd/clc.cc:892
#7 0x00007f66314067e8 in avnd_comp_cleanup_launch (comp=comp@entry=0x7f66324d46b0) at src/amf/amfnd/util.cc:178
#8 0x00007f6631405beb in avnd_last_step_clean (cb=cb@entry=0x7f663161f240 <_avnd_cb>) at src/amf/amfnd/term.cc:76
#9 0x00007f66313e13b9 in avnd_di_msg_ack_process (cb=cb@entry=0x7f663161f240 <_avnd_cb>, mid=<optimized out="">)
at src/amf/amfnd/di.cc:1264
#10 0x00007f66313e1484 in avnd_evt_avd_ack_evh (cb=0x7f663161f240 <_avnd_cb>, evt=0x7f6628001010)
at src/amf/amfnd/di.cc:411
#11 0x00007f66313ec9df in avnd_evt_process (evt=0x7f6628001010) at src/amf/amfnd/main.cc:658
#12 avnd_main_process () at src/amf/amfnd/main.cc:610
#13 0x00007f66313c261f in main (argc=2, argv=0x7ffc47fa34f8) at src/amf/amfnd/main.cc:203
Diff:
Escalation is not reaching to node failover in this issue because both comp and su restart prob timer value is very less( less than a nano second).
develop:
commit 126c7d9c59a41205ce16c2c9e8a7cae7457a0c2c
Author: Praveen praveen.malviya@oracle.com
Date: Tue Sep 12 17:08:11 2017 +0530
commit 74476b88a30c80c788e56b6ede2baea040e22c18
Author: Praveen praveen.malviya@oracle.com
Date: Tue Sep 12 17:08:11 2017 +0530