Changeset : 6377
Issue : amfd crashed during SG unlock operation, after update of flag saAmfSGMaxStandbySIsperSU to 0
Steps performed :
-> Amf application is configured in 2N redundancy model, with 2 SUs and 5 SI's ( si-si dependency configured with 1 sponsor SI and other dependent SI)
-> Initially application is brought up by unlocking the SUs and SG.
-> locked the SG
-> Now ran the command on locked SG.
immcfg -a saAmfSGMaxStandbySIsperSU=0 safSg=SG,safApp=test2nApp
Apr 30 14:29:04 SYSTEST-CNTLR-1 osafimmnd[10286]: NO Ccb 95 COMMITTED (immcfg_SYSTEST-CNTLR-1_14095)
-> Finally unlocked the SG, for which amfd crashed with the following syslog.
SYSTEST-CNTLR-1:/opt/goahead/tetware/opensaffire/scripts # immadm -o 1 safSg=SG,safApp=test2nApp
Apr 30 14:29:14 SYSTEST-CNTLR-1 osafamfd[10349]: su.cc:1821: inc_curr_stdby_si: Assertion 'saAmfSUNumCurrStandbySIs <= sg_of_su->saAmfSGMaxStandbySIsperSU' failed.
Apr 30 14:29:14 SYSTEST-CNTLR-1 osafamfnd[10359]: ER AMF director unexpectedly crashed
Apr 30 14:29:14 SYSTEST-CNTLR-1 osafamfnd[10359]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131343, SupervisionTime = 60
Apr 30 14:29:14 SYSTEST-CNTLR-1 opensaf_reboot: Rebooting local node; timeout=60
Apr 30 14:29:14 SYSTEST-CNTLR-1 osafimmnd[10286]: NO Implementer locally disconnected. Marking it as doomed 4 <19, 2010f> (safAmfService)
Apr 30 14:29:14 SYSTEST-CNTLR-1 osafimmnd[10286]: NO Implementer disconnected 4 <19, 2010f> (safAmfService)
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
Apr 30 14:29:15 SYSTEST-CNTLR-1 osafimmnd[10286]: WA Timeout on syncronous admin operation 1
-> Also standby controller and cluster went for reboot, with amfd crashed for the same assertion.
Apr 30 14:29:34 SYSTEST-CNTLR-2 osafimmnd[20291]: NO Implementer disconnected 25 <339, 2020f> (MsgQueueService131343)
Apr 30 14:29:34 SYSTEST-CNTLR-2 osafamfd[20347]: su.cc:1821: inc_curr_stdby_si: Assertion 'saAmfSUNumCurrStandbySIs <= sg_of_su->saAmfSGMaxStandbySIsperSU' failed.
Apr 30 14:29:34 SYSTEST-CNTLR-2 osafimmnd[20291]: NO Implementer connected: 26 (safSmfService) <335, 2020f>
Apr 30 14:29:34 SYSTEST-CNTLR-2 osafimmpbed: NO Successfully opened pre-existing sqlite pbe file /home/immPBE/imm.db
Apr 30 14:29:34 SYSTEST-CNTLR-2 osafamfnd[20357]: ER AMF director unexpectedly crashed
Apr 30 14:29:34 SYSTEST-CNTLR-2 osafamfnd[20357]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60
Below is the backtrace.
#0 0x00007fdf22d02b55 in raise () from /lib64/libc.so.6
#1 0x00007fdf22d04131 in abort () from /lib64/libc.so.6
#2 0x00007fdf24aeb37a in __osafassert_fail () from /usr/lib64/libopensaf_core.so.0
#3 0x0000000000477454 in AVD_SU::inc_curr_stdby_si() () at su.cc:1821
#4 0x000000000046b168 in avd_susi_update_assignment_counters(avd_su_si_rel_tag, AVSV_SUSI_ACT, SaAmfHAStateT, SaAmfHAStateT) ()
at siass.cc:624
#5 0x000000000046bbc5 in avd_susi_create(cl_cb_tag, AVD_SI, AVD_SU, SaAmfHAStateT, bool) () at siass.cc:255
#6 0x0000000000464dd8 in avd_new_assgn_susi(cl_cb_tag, AVD_SU, AVD_SI, SaAmfHAStateT, bool, avd_su_si_rel_tag) () at sgproc.cc:111
#7 0x00000000004453b1 in avd_sg_2n_su_chose_asgn(cl_cb_tag, AVD_SG) () at sg_2n_fsm.cc:700
#8 0x0000000000449068 in SG_2N::susi_success_sg_realign(AVD_SU, avd_su_si_rel_tag, AVSV_SUSI_ACT, SaAmfHAStateT) () at sg_2n_fsm.cc:1814
#9 0x000000000044a5bc in SG_2N::susi_success(cl_cb_tag, AVD_SU, avd_su_si_rel_tag, AVSV_SUSI_ACT, SaAmfHAStateT) () at sg_2n_fsm.cc:2379
#10 0x0000000000467278 in avd_su_si_assign_evh(cl_cb_tag, avd_evt_tag) () at sgproc.cc:1251
#11 0x00000000004332b6 in process_event(cl_cb_tag, avd_evt_tag) () at main.cc:775
#12 0x0000000000407b3c in main () at main.cc:395
While bringing up the configuration, the attributes saAmfSGMaxActiveSIsperSU and saAmfSGMaxStandbySIsperSU are ignored for the 2n model.
Apr 30 14:39:17 SYSTEST-CNTLR-1 osafamfd[2346]: NO 'safSg=SG,safApp=test2nApp' attribute saAmfSGMaxActiveSIsperSU ignored, not valid for red model
Apr 30 14:39:17 SYSTEST-CNTLR-1 osafamfd[2346]: NO 'safSg=SG,safApp=test2nApp' attribute saAmfSGMaxStandbySIsperSU ignored, not valid for red model
Once the application is configured, the mentioned attributes change is reflected accordingly.
The attributes should be totally supported or ignored for the 2n model.
It is reproducible with the following steps: 2N Red models, 2 controllers and 1 payload:
1. Configure SU1 on SC-1 as Act.
2. Configure SU2 on SC-2 as Std.
3. Configure SU3(unlock) on PL-3.
Lock SG and modify saAmfSGMaxStandbySIsperSU to zero and then unlock SG.
Act Amfd crashes leading to SC-1 node reboot.
SC-2 becomes Act and assigns Act to SU2 and try to assign Std to SU3 and this Amfd also crashes leading to Cluster reboot.
Thanks
-Nagu
changeset: 6590:d775d8fb7951
branch: opensaf-4.5.x
parent: 6587:071c4ca7a679
user: Nagendra Kumarnagendra.k@oracle.com
date: Wed May 27 12:31:51 2015 +0530
summary: amfd: ignore invalid modification of saAmfSGMaxActiveSIsperSU/saAmfSGMaxStandbySIsperSU [#1361]
changeset: 6591:05d5ba64ae8a
branch: opensaf-4.6.x
parent: 6588:21730a950421
user: Nagendra Kumarnagendra.k@oracle.com
date: Wed May 27 12:32:04 2015 +0530
summary: amfd: ignore invalid modification of saAmfSGMaxActiveSIsperSU/saAmfSGMaxStandbySIsperSU [#1361]
changeset: 6592:17406d1e43d3
tag: tip
parent: 6589:d719ade2b028
user: Nagendra Kumarnagendra.k@oracle.com
date: Wed May 27 12:32:12 2015 +0530
summary: amfd: ignore invalid modification of saAmfSGMaxActiveSIsperSU/saAmfSGMaxStandbySIsperSU [#1361]
[staging:d775d8]
[staging:05d5ba]
[staging:17406d]
Related
Tickets:
#1361