backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000561b1a8ca5b5 in avnd_su_si_oper_done (cb=cb@entry=0x561b1aae8240 <_avnd_cb>, su=su@entry=0x561b1acd23f0, si=0x561b1acdc8f0) at src/amf/amfnd/susm.cc:1197 [Current thread is 1 (Thread 0x7f1764548780 (LWP 193))] Thread 1 (Thread 0x7f1764548780 (LWP 193)): #0 0x0000561b1a8ca5b5 in avnd_su_si_oper_done (cb=cb@entry=0x561b1aae8240 <_avnd_cb>, su=su@entry=0x561b1acd23f0, si=0x561b1acdc8f0) at src/amf/amfnd/susm.cc:1197 tmp = 0x38313333352e321a curr_si = <optimized out> curr_csi = <optimized out> t_csi = 0x0 rc = <optimized out> opr_done = 26 t_ = {trace_leave_called = false, file_ = 0x561b1a8db949 "src/amf/amfnd/susm.cc", function_ = 0x561b1a8dc450 <avnd_su_si_oper_done(avnd_cb_tag*, avnd_su_tag*, avnd_su_si_rec*)::__FUNCTION__> "avnd_su_si_oper_done"} __FUNCTION__ = "avnd_su_si_oper_done" #1 0x0000561b1a8c8fda in avnd_su_pres_st_chng_prc (final_st=SA_AMF_PRESENCE_UNINSTANTIATED, prv_st=SA_AMF_PRESENCE_TERMINATING, su=0x561b1acd23f0, cb=0x561b1aae8240 <_avnd_cb>) at src/amf/amfnd/susm.cc:1959 #2 avnd_su_pres_fsm_run (cb=cb@entry=0x561b1aae8240 <_avnd_cb>, su=0x561b1acd23f0, comp=comp@entry=0x561b1ace7120, ev=<optimized out>) at src/amf/amfnd/susm.cc:1611 #3 0x0000561b1a8907b2 in avnd_comp_clc_st_chng_prc (cb=cb@entry=0x561b1aae8240 <_avnd_cb>, comp=comp@entry=0x561b1ace7120, prv_st=prv_st@entry=SA_AMF_PRESENCE_TERMINATING, final_st=final_st@entry=SA_AMF_PRESENCE_UNINSTANTIATED) at src/amf/amfnd/clc.cc:1501 #4 0x0000561b1a894cf3 in avnd_comp_clc_fsm_run (cb=cb@entry=0x561b1aae8240 <_avnd_cb>, comp=comp@entry=0x561b1ace7120, ev=AVND_COMP_CLC_PRES_FSM_EV_TERM_SUCC) at src/amf/amfnd/clc.cc:892 #5 0x0000561b1a89563b in avnd_evt_clc_resp_evh (cb=0x561b1aae8240 <_avnd_cb>, evt=0x7f174c009950) at src/amf/amfnd/clc.cc:414 #6 0x0000561b1a8b327a in avnd_evt_process (evt=0x7f174c009950) at src/amf/amfnd/main.cc:658 #7 avnd_main_process () at src/amf/amfnd/main.cc:610 #8 0x0000561b1a886b92 in main (argc=2, argv=0x7ffeca215898) at src/amf/amfnd/main.cc:203
Original syslog and amfnd trace are attached.
Suspicious code:
uint32_t avnd_su_si_oper_done(AVND_CB *cb, AVND_SU *su, AVND_SU_SI_REC *si) { ... if (tmp != nullptr) { uint32_t sirank = tmp->rank; for (; tmp && (tmp->rank == sirank); tmp = avnd_silist_getprev(tmp)) { // line 1197 uint32_t rc = avnd_su_si_remove(cb, tmp->su, tmp); osafassert(rc == NCSCC_RC_SUCCESS); } } else { LOG_NO("Removed assignments from AMF components"); ... }
Try a debug patch (as added at the end) to print the pointer @tmp. The syslog-1 is where the segv occurs. The syslog-2 does not hit segv magically, but the @tmp->name is pointing to a rubbish string
The @tmp pointer is valid before calling avnd_su_si_remove(), however the function avnd_su_si_remove() / avnd_su_si_oper_done() are recursively called and object pointed by @temp is deleted after avnd_su_si_remove()
syslog-1:
2018-07-13 07:46:35.760 PL-3 osafamfnd[193]: NO 'safSu=2,safSg=1,safApp=abcdtest' Presence State TERMINATING => UNINSTANTIATED
2018-07-13 07:46:35.761 PL-3 osafamfnd[193]: NO Removed 'safSi=2,safApp=abcdtest' from 'safSu=2,safSg=1,safApp=abcdtest'
2018-07-13 07:46:35.761 PL-3 osafamfnd[193]: NO before avnd_su_si_remove, tmp:0x558b4b0f5a20, su: safSu=1,safSg=1,safApp=abcdtest, si:safSi=1,safApp=abcdtest
2018-07-13 07:46:35.762 PL-3 osafamfnd[193]: NO 'safSu=1,safSg=1,safApp=abcdtest' Presence State INSTANTIATED => TERMINATING
2018-07-13 07:46:35.762 PL-3 osafamfnd[193]: NO 'safSu=1,safSg=1,safApp=abcdtest' Presence State TERMINATING => UNINSTANTIATED
2018-07-13 07:46:35.763 PL-3 osafamfnd[193]: NO Removed 'safSi=1,safApp=abcdtest' from 'safSu=1,safSg=1,safApp=abcdtest'
2018-07-13 07:46:35.763 PL-3 osafamfnd[193]: NO Removed assignments from AMF components
2018-07-13 07:46:35.764 PL-3 osafamfnd[193]: NO Terminating all AMF components
2018-07-13 07:46:35.767 PL-3 osafamfnd[193]: NO free si_rec:0x558b4b0f5a20, su:safSu=1,safSg=1,safApp=abcdtest, si:safSi=1,safApp=abcdtest
2018-07-13 07:46:35.776 PL-3 2[346]: AL AMF Node Director is down, terminate this process
2018-07-13 07:46:35.776 PL-3 osafclmna[168]: AL AMF Node Director is down, terminate this process
2018-07-13 07:46:35.778 PL-3 osafimmnd[178]: AL AMF Node Director is down, terminate this process
2018-07-13 07:46:35.783 PL-3 systemd[1]: opensafd.service: Main process exited, code=dumped, status=11/SEGV
2018-07-13 07:46:35.783 PL-3 osafsmfnd[205]: AL AMF Node Director is down, terminate this process
2018-07-13 07:46:35.784 PL-3 osafckptnd[229]: AL AMF Node Director is down, terminate this process
2018-07-13 07:46:35.784 PL-3 osafamfwd[239]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = 131855, SupervisionTime = 60
syslog-2:
2018-07-13 15:37:04.456 PL-3 amfclccli[429]: DB CLEANUP request 'safComp=1,safSu=1,safSg=1,safApp=abcdtest'
2018-07-13 15:37:04.474 PL-3 amfclccli[429]: DB CLEANUP response 'kill(pid=327)'
2018-07-13 15:37:04.475 PL-3 amfclccli[429]: WA Failed to kill pid=327 with signal 9 - [Errno 3] No such process
2018-07-13 15:37:06.940 PL-3 osafamfnd[188]: NO 'safSu=2,safSg=1,safApp=abcdtest' Presence State TERMINATING => UNINSTANTIATED
2018-07-13 15:37:06.940 PL-3 osafamfnd[188]: NO Removed 'safSi=2,safApp=abcdtest' from 'safSu=2,safSg=1,safApp=abcdtest'
2018-07-13 15:37:06.940 PL-3 osafamfnd[188]: NO before avnd_su_si_remove, tmp:0x55c251477020, su: safSu=1,safSg=1,safApp=abcdtest, si:safSi=1,safApp=abcdtest
2018-07-13 15:37:06.940 PL-3 osafamfnd[188]: NO 'safSu=1,safSg=1,safApp=abcdtest' Presence State INSTANTIATED => TERMINATING
2018-07-13 15:37:06.941 PL-3 osafamfnd[188]: NO 'safSu=1,safSg=1,safApp=abcdtest' Presence State TERMINATING => UNINSTANTIATED
2018-07-13 15:37:06.941 PL-3 osafamfnd[188]: NO Removed 'safSi=1,safApp=abcdtest' from 'safSu=1,safSg=1,safApp=abcdtest'
2018-07-13 15:37:06.941 PL-3 osafamfnd[188]: NO Removed assignments from AMF components
2018-07-13 15:37:06.942 PL-3 osafamfnd[188]: NO Terminating all AMF components
2018-07-13 15:37:06.944 PL-3 osafamfnd[188]: NO free si_rec:0x55c251477020, su:safSu=1,safSg=1,safApp=abcdtest, si:safSi=1,safApp=abcdtest
2018-07-13 15:37:06.945 PL-3 osafamfnd[188]: NO after avnd_su_si_remove, tmp:0x55c251477020, su: safSu=1,safSg=1,safApp=abcdtest, si:��FQ�U
2018-07-13 15:37:06.945 PL-3 osafamfnd[188]: NO free si_rec:0x55c2514762f0, su:safSu=2,safSg=1,safApp=abcdtest, si:safSi=2,safApp=abcdtest
2018-07-13 15:37:06.959 PL-3 osafckptnd[224]: exiting for shutdown, (sigterm from pid 471 uid 0)
debug patch:
diff --git a/src/amf/amfnd/sidb.cc b/src/amf/amfnd/sidb.cc index 9f11e65..731d08d 100644 --- a/src/amf/amfnd/sidb.cc +++ b/src/amf/amfnd/sidb.cc @@ -795,6 +795,7 @@ uint32_t avnd_su_si_rec_del(AVND_CB *cb, const std::string &su_name, si_name.c_str()); /* free the memory */ + LOG_NO("free si_rec:%p, su:%s, si:%s", si_rec, si_rec->su->name.c_str(), si_rec->name.c_str()); delete si_rec; return rc; diff --git a/src/amf/amfnd/susm.cc b/src/amf/amfnd/susm.cc index c5f8240..d7cecd2 100644 --- a/src/amf/amfnd/susm.cc +++ b/src/amf/amfnd/susm.cc @@ -1194,8 +1194,11 @@ uint32_t avnd_su_si_oper_done(AVND_CB *cb, AVND_SU *su, AVND_SU_SI_REC *si) { if (tmp != nullptr) { uint32_t sirank = tmp->rank; - for (; tmp && (tmp->rank == sirank); tmp = avnd_silist_getprev(tmp)) { + for (; tmp && (tmp->rank == sirank); + tmp = avnd_silist_getprev(tmp)) { + LOG_NO("before avnd_su_si_remove, tmp:%p, su: %s, si:%s", tmp, tmp->su->name.c_str(), tmp->name.c_str()); uint32_t rc = avnd_su_si_remove(cb, tmp->su, tmp); + LOG_NO("after avnd_su_si_remove, tmp:%p, su: %s, si:%s", tmp, tmp->su->name.c_str(), tmp->name.c_str()); osafassert(rc == NCSCC_RC_SUCCESS); } } else {
commit b420e02f0d53f23ce1ff53cc65abc98d347332b5
Author: Minh Chau minh.chau@dektech.com.au
Date: Wed Jul 18 09:10:08 2018 +1000