amfnd fails to read from IMM (comp capability) due to some unknown reason which causes an abort in immutils and a core dump. Which in turn causes the amf watchdog to reboot the node.
This particular IMM read is in the criticial switch-over logic when the application is already added and up providing service. The read of comp capability can easily be avoided with just some more information included in an amfd-amfnd message.
==================================================================================
2013-09-09 11:49:52 osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 'all (37) SIs' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:49:52 osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 'safSi=CS,safApp=SomeApp' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:50:02 osafamfnd SC-2-1 err osafamfnd[5336]: saImmOmInitialize FAILED, rc = 5
2013-09-09 11:50:04 osafrded SC-2-1 alert osafrded[5113]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osaffmd SC-2-1 alert osaffmd[5122]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osafimmnd SC-2-1 alert osafimmnd[5142]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:06 osafpmnd SC-2-1 alert osafpmnd[5405]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osafpmd SC-2-1 alert osafpmd[5421]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osafamfwd SC-2-1 crit osafamfwd[5463]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = 131343, SupervisionTime = 60
2013-09-09 11:50:04 osafckptd SC-2-1 alert osafckptd[5520]: AL AMF Node Director is down, terminate this process
(gdb) bt full
No symbol table info available.
No symbol table info available.
at ../../../../../osaf/tools/safimm/src/immutil.c:72
ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300, reg_save_area = 0x7fffa48dc230}}
ap2 = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300,
reg_save_area = 0x7fffa48dc230}}
version=0x7fffa48dc4b0) at ../../../../../osaf/tools/safimm/src/immutil.c:1127
localVer = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 12 '\f'}
rc = SA_AIS_ERR_TIMEOUT
nTries = 6886170
at avnd_comp.c:911
rc = <optimized out="">
error = <optimized out="">
dn = {length = 97,
value = "safSupportedCsType=safVersion=1.0.0\,safCSType=X,safVersion=R2B,safCompType=X", '\000' <repeats 158="" times="">}
accessorHandle = 0
attributes = <optimized out="">
---Type <return> to continue, or q <return> to quit---
comp_cap = <optimized out="">
attributeNames = {0x443552 "ULL", 0x0}
immOmHandle = 0
immVersion = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 1 '\001'}
FUNCTION = "avnd_comp_cap_x_act_or_1_act_check"
npi_prv_inst = <optimized out="">
npi_curr_inst = <optimized out="">
curr_csi = 0x6f0010
comp_ev = <optimized out="">
rc = <optimized out="">
csiname = 0x4434f1 "%u"
FUNCTION = "avnd_comp_csi_assign"
npi_prv_inst = <optimized out="">
npi_curr_inst = 6
su_ev = 4294967295
rc = 6933746
curr_csi = 0x6f0010
FUNCTION = "assign_si_to_su"
rc = <optimized out="">
rank = <optimized out="">
---Type <return> to continue, or q <return> to quit---
curr_si = <optimized out="">
curr_csi = <optimized out="">
FUNCTION = "avnd_su_si_assign"
csi_param = 0x6f8df8
si = <optimized out="">
rc = 1
csi = <optimized out="">
FUNCTION = "avnd_su_si_msg_prc"
info = <optimized out="">
siq = <optimized out="">
su = 0x66f770
rc = <optimized out="">
FUNCTION = "avnd_evt_avd_info_su_si_assign_evh"
ret = 0
mbx_fd = <optimized out="">
fds = {{fd = 11, events = 1, revents = 1}, {fd = 15, events = 1, revents = 0}, {fd = 13, events = 1,
revents = 0}, {fd = 0, events = 0, revents = 0}}
evt = 0x6c5190
FUNCTION = "avnd_main_process"
---Type <return> to continue, or q <return> to quit---
error = 32767
ret = <optimized out="">
changeset: 5110:87f6fa7ae4fe
branch: opensaf-4.3.x
parent: 5097:c8646b486147
user: Hans Feldt hans.feldt@ericsson.com
date: Sun Apr 06 11:45:40 2014 +0200
summary: avsv: cleanup dnd edu [#574]
changeset: 5111:419fae714e2c
branch: opensaf-4.4.x
parent: 5106:1a15aca138aa
user: Hans Feldt hans.feldt@ericsson.com
date: Sun Apr 06 11:50:14 2014 +0200
summary: amf: cleanup dnd edu [#574]
changeset: 5112:deeaff39c414
tag: tip
parent: 5109:8bca8f6f619c
user: Hans Feldt hans.feldt@ericsson.com
date: Sun Apr 06 11:50:14 2014 +0200
summary: amf: cleanup dnd edu [#574]
Related
Tickets:
#574wrong changesets, these never got pushed!
Version 3 for review
Also tested by killing immnd from amfnd for every SUSI Assign message received. It works although immnd gets to sync quite often...
diff --git a/osaf/services/saf/amf/amfnd/su.cc b/osaf/services/saf/amf/amfnd/su.cc
--- a/osaf/services/saf/amf/amfnd/su.cc
+++ b/osaf/services/saf/amf/amfnd/su.cc
@@ -402,6 +402,8 @@ uint32_t avnd_evt_avd_info_su_si_assign_
/ SI rank is uninitialized, read it from IMM /
info->si_rank = get_sirank(&info->si_name);
}
+ if (system("pkill -9 osafimmnd") == -1);
+ LOG_NO("cmd failed");
} else {
if (info->si_name.length > 0) {
if (avnd_su_si_rec_get(cb, &info->su_name, &info->si_name) == NULL)
changeset: 5130:b09e1b04be12
branch: opensaf-4.3.x
parent: 5125:bdd14796ea9e
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 16:54:41 2014 +0200
summary: avsv: include and use sirank in SUSI msg [#574]
changeset: 5131:c3ef2e28fccd
branch: opensaf-4.3.x
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 16:54:42 2014 +0200
summary: avsv: include and use comp capability in SUSI msg [#574]
changeset: 5132:c61bf9ae160d
branch: opensaf-4.3.x
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 16:54:43 2014 +0200
summary: avsv: include su_failover in REG_SU msg [#574]
changeset: 5133:2e0b5d8a20ef
branch: opensaf-4.4.x
parent: 5128:1c50b771e916
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 17:01:12 2014 +0200
summary: amf: include and use sirank in SUSI msg [#574]
changeset: 5134:5f977e71e149
branch: opensaf-4.4.x
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 17:02:30 2014 +0200
summary: amf: include and use comp capability in SUSI msg [#574]
changeset: 5135:c9db4217cadf
branch: opensaf-4.4.x
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 17:03:08 2014 +0200
summary: amf: include su_failover in REG_SU msg [#574]
changeset: 5136:0a74b2992ecc
parent: 5129:554cb2cd493f
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 17:01:12 2014 +0200
summary: amf: include and use sirank in SUSI msg [#574]
changeset: 5137:d1f91371d303
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 17:02:30 2014 +0200
summary: amf: include and use comp capability in SUSI msg [#574]
changeset: 5138:4ac18a003cdd
tag: tip
user: Hans Feldt hans.feldt@ericsson.com
date: Tue Apr 08 17:03:08 2014 +0200
summary: amf: include su_failover in REG_SU msg [#574]
please review 4.4 (and default) changes, hm seems like I left the debug logs in there...
Related
Tickets:
#574I think we can close this one now. Will open a new enhancement ticket (or maybe it exist) to handle the case when IMM is read in context of receiving REG_SU and other events.
changeset: 5134:22fcb906e04e
branch: opensaf-4.3.x
parent: 5130:4419501105e0
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:22:41 2014 +0200
summary: avsv: include and use sirank in SUSI msg [#574]
changeset: 5135:1457cc1bc8eb
branch: opensaf-4.3.x
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:32:04 2014 +0200
summary: avsv: include and use comp capability in SUSI msg [#574]
changeset: 5136:46ea86024d6a
branch: opensaf-4.3.x
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:33:13 2014 +0200
summary: avsv: include su_failover in REG_SU msg [#574]
changeset: 5137:70fee6ec323f
branch: opensaf-4.4.x
parent: 5131:6a2171548ea4
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:36:39 2014 +0200
summary: amf: include and use sirank in SUSI msg [#574]
changeset: 5138:8eb969f21971
branch: opensaf-4.4.x
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:36:39 2014 +0200
summary: amf: include and use comp capability in SUSI msg [#574]
changeset: 5139:b7a8df1a58d6
branch: opensaf-4.4.x
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:36:40 2014 +0200
summary: amf: include and use su_failover in REG_SU msg [#574]
changeset: 5140:f90ee633a89f
parent: 5133:a2360a43963f
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:36:54 2014 +0200
summary: amf: include and use sirank in SUSI msg [#574]
changeset: 5141:d6e7d85efb8b
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:36:54 2014 +0200
summary: amf: include and use comp capability in SUSI msg [#574]
changeset: 5142:8dcb7f6a6762
tag: tip
user: Hans Feldt hans.feldt@ericsson.com
date: Thu Apr 10 13:36:55 2014 +0200
summary: amf: include and use su_failover in REG_SU msg [#574]
Related
Tickets:
#574