Menu

#574 amfnd abort causing node reboot

4.3.3
fixed
nobody
None
defect
amf
-
4.3
major
2014-08-07
2013-09-24
Hans Feldt
No

amfnd fails to read from IMM (comp capability) due to some unknown reason which causes an abort in immutils and a core dump. Which in turn causes the amf watchdog to reboot the node.

This particular IMM read is in the criticial switch-over logic when the application is already added and up providing service. The read of comp capability can easily be avoided with just some more information included in an amfd-amfnd message.

==================================================================================

2013-09-09 11:49:52 osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 'all (37) SIs' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:49:52 osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 'safSi=CS,safApp=SomeApp' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:50:02 osafamfnd SC-2-1 err osafamfnd[5336]: saImmOmInitialize FAILED, rc = 5
2013-09-09 11:50:04 osafrded SC-2-1 alert osafrded[5113]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osaffmd SC-2-1 alert osaffmd[5122]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osafimmnd SC-2-1 alert osafimmnd[5142]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:06 osafpmnd SC-2-1 alert osafpmnd[5405]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osafpmd SC-2-1 alert osafpmd[5421]: AL AMF Node Director is down, terminate this process
2013-09-09 11:50:04 osafamfwd SC-2-1 crit osafamfwd[5463]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = 131343, SupervisionTime = 60
2013-09-09 11:50:04 osafckptd SC-2-1 alert osafckptd[5520]: AL AMF Node Director is down, terminate this process

(gdb) bt full

0 0x00007fab46742b35 in raise () from /lib64/libc.so.6

No symbol table info available.

1 0x00007fab46744111 in abort () from /lib64/libc.so.6

No symbol table info available.

2 0x00000000004051f8 in defaultImmutilError (fmt=0x43fef0 "rc = %d")

at ../../../../../osaf/tools/safimm/src/immutil.c:72
ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300, reg_save_area = 0x7fffa48dc230}}
ap2 = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300,
reg_save_area = 0x7fffa48dc230}}

3 0x00000000004065f4 in immutil_saImmOmInitialize (immHandle=0x7fffa48dc490, immCallbacks=0x0,

version=0x7fffa48dc4b0) at ../../../../../osaf/tools/safimm/src/immutil.c:1127
localVer = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 12 '\f'}
rc = SA_AIS_ERR_TIMEOUT
nTries = 6886170

4 0x000000000041df97 in avnd_comp_cap_x_act_or_1_act_check (comp_type=0x69131a, csi_type=0x6f0142)

at avnd_comp.c:911
rc = <optimized out="">
error = <optimized out="">
dn = {length = 97,
value = "safSupportedCsType=safVersion=1.0.0\,safCSType=X,safVersion=R2B,safCompType=X", '\000' <repeats 158="" times="">}
accessorHandle = 0
attributes = <optimized out="">
---Type <return> to continue, or q <return> to quit---
comp_cap = <optimized out="">
attributeNames = {0x443552 "ULL", 0x0}
immOmHandle = 0
immVersion = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 1 '\001'}
FUNCTION = "avnd_comp_cap_x_act_or_1_act_check"

5 0x000000000041e43b in avnd_comp_csi_assign (cb=0x6578c0, comp=0x6911e0, csi=0x0) at avnd_comp.c:1017

npi_prv_inst = <optimized out="">
npi_curr_inst = <optimized out="">
curr_csi = 0x6f0010
comp_ev = <optimized out="">
rc = <optimized out="">
csiname = 0x4434f1 "%u"
FUNCTION = "avnd_comp_csi_assign"

6 0x0000000000436d9c in assign_si_to_su (si=0x69ccc0, su=0x66f770, single_csi=0) at avnd_susm.c:561

npi_prv_inst = <optimized out="">
npi_curr_inst = 6
su_ev = 4294967295
rc = 6933746
curr_csi = 0x6f0010
FUNCTION = "assign_si_to_su"

7 0x0000000000437219 in avnd_su_si_assign (cb=<optimized out="">, su=0x66f770, si=0x69ccc0) at avnd_susm.c:606

rc = <optimized out="">
rank = <optimized out="">
---Type <return> to continue, or q <return> to quit---
curr_si = <optimized out="">
curr_csi = <optimized out="">
FUNCTION = "avnd_su_si_assign"

8 0x0000000000434b9d in avnd_su_si_msg_prc (cb=0x6578c0, su=0x66f770, info=<optimized out="">) at avnd_susm.c:349

csi_param = 0x6f8df8
si = <optimized out="">
rc = 1
csi = <optimized out="">
FUNCTION = "avnd_su_si_msg_prc"

9 0x000000000043216e in avnd_evt_avd_info_su_si_assign_evh (cb=0x6578c0, evt=<optimized out="">) at avnd_su.c:258

info = <optimized out="">
siq = <optimized out="">
su = 0x66f770
rc = <optimized out="">
FUNCTION = "avnd_evt_avd_info_su_si_assign_evh"

10 0x0000000000430190 in avnd_main_process () at avnd_proc.c:218

ret = 0
mbx_fd = <optimized out="">
fds = {{fd = 11, events = 1, revents = 1}, {fd = 15, events = 1, revents = 0}, {fd = 13, events = 1,
revents = 0}, {fd = 0, events = 0, revents = 0}}
evt = 0x6c5190
FUNCTION = "avnd_main_process"

11 0x0000000000408815 in main (argc=1, argv=0x7fffa48dc7a8) at amfnd_main.c:61

---Type <return> to continue, or q <return> to quit---
error = 32767
ret = <optimized out="">

Related

Tickets: #574
Wiki: ChangeLog-4.3.3
Wiki: ChangeLog-4.4.1

Discussion

  • Hans Feldt

    Hans Feldt - 2013-09-24
    • status: unassigned --> accepted
    • assigned_to: Hans Feldt
     
  • Hans Feldt

    Hans Feldt - 2013-10-01
    • status: accepted --> review
     
  • Hans Feldt

    Hans Feldt - 2013-11-04
    • status: review --> accepted
     
  • Hans Feldt

    Hans Feldt - 2014-03-28
    • status: accepted --> review
     
  • Hans Feldt

    Hans Feldt - 2014-04-06

    changeset: 5110:87f6fa7ae4fe
    branch: opensaf-4.3.x
    parent: 5097:c8646b486147
    user: Hans Feldt hans.feldt@ericsson.com
    date: Sun Apr 06 11:45:40 2014 +0200
    summary: avsv: cleanup dnd edu [#574]

    changeset: 5111:419fae714e2c
    branch: opensaf-4.4.x
    parent: 5106:1a15aca138aa
    user: Hans Feldt hans.feldt@ericsson.com
    date: Sun Apr 06 11:50:14 2014 +0200
    summary: amf: cleanup dnd edu [#574]

    changeset: 5112:deeaff39c414
    tag: tip
    parent: 5109:8bca8f6f619c
    user: Hans Feldt hans.feldt@ericsson.com
    date: Sun Apr 06 11:50:14 2014 +0200
    summary: amf: cleanup dnd edu [#574]

     

    Related

    Tickets: #574

    • Hans Feldt

      Hans Feldt - 2014-04-10

      wrong changesets, these never got pushed!

       
  • Hans Feldt

    Hans Feldt - 2014-04-07

    Version 3 for review

     
  • Hans Feldt

    Hans Feldt - 2014-04-07

    Also tested by killing immnd from amfnd for every SUSI Assign message received. It works although immnd gets to sync quite often...

    diff --git a/osaf/services/saf/amf/amfnd/su.cc b/osaf/services/saf/amf/amfnd/su.cc
    --- a/osaf/services/saf/amf/amfnd/su.cc
    +++ b/osaf/services/saf/amf/amfnd/su.cc
    @@ -402,6 +402,8 @@ uint32_t avnd_evt_avd_info_su_si_assign_
    / SI rank is uninitialized, read it from IMM /
    info->si_rank = get_sirank(&info->si_name);
    }
    + if (system("pkill -9 osafimmnd") == -1);
    + LOG_NO("cmd failed");
    } else {
    if (info->si_name.length > 0) {
    if (avnd_su_si_rec_get(cb, &info->su_name, &info->si_name) == NULL)

     
  • Hans Feldt

    Hans Feldt - 2014-04-08

    changeset: 5130:b09e1b04be12
    branch: opensaf-4.3.x
    parent: 5125:bdd14796ea9e
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 16:54:41 2014 +0200
    summary: avsv: include and use sirank in SUSI msg [#574]

    changeset: 5131:c3ef2e28fccd
    branch: opensaf-4.3.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 16:54:42 2014 +0200
    summary: avsv: include and use comp capability in SUSI msg [#574]

    changeset: 5132:c61bf9ae160d
    branch: opensaf-4.3.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 16:54:43 2014 +0200
    summary: avsv: include su_failover in REG_SU msg [#574]

    changeset: 5133:2e0b5d8a20ef
    branch: opensaf-4.4.x
    parent: 5128:1c50b771e916
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 17:01:12 2014 +0200
    summary: amf: include and use sirank in SUSI msg [#574]

    changeset: 5134:5f977e71e149
    branch: opensaf-4.4.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 17:02:30 2014 +0200
    summary: amf: include and use comp capability in SUSI msg [#574]

    changeset: 5135:c9db4217cadf
    branch: opensaf-4.4.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 17:03:08 2014 +0200
    summary: amf: include su_failover in REG_SU msg [#574]

    changeset: 5136:0a74b2992ecc
    parent: 5129:554cb2cd493f
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 17:01:12 2014 +0200
    summary: amf: include and use sirank in SUSI msg [#574]

    changeset: 5137:d1f91371d303
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 17:02:30 2014 +0200
    summary: amf: include and use comp capability in SUSI msg [#574]

    changeset: 5138:4ac18a003cdd
    tag: tip
    user: Hans Feldt hans.feldt@ericsson.com
    date: Tue Apr 08 17:03:08 2014 +0200
    summary: amf: include su_failover in REG_SU msg [#574]

    please review 4.4 (and default) changes, hm seems like I left the debug logs in there...

     

    Related

    Tickets: #574

  • Hans Feldt

    Hans Feldt - 2014-04-08

    I think we can close this one now. Will open a new enhancement ticket (or maybe it exist) to handle the case when IMM is read in context of receiving REG_SU and other events.

     
  • Hans Feldt

    Hans Feldt - 2014-04-08
    • status: review --> fixed
    • assigned_to: Hans Feldt --> nobody
    • Milestone: future --> 4.3.3
     
  • Hans Feldt

    Hans Feldt - 2014-04-10

    changeset: 5134:22fcb906e04e
    branch: opensaf-4.3.x
    parent: 5130:4419501105e0
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:22:41 2014 +0200
    summary: avsv: include and use sirank in SUSI msg [#574]

    changeset: 5135:1457cc1bc8eb
    branch: opensaf-4.3.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:32:04 2014 +0200
    summary: avsv: include and use comp capability in SUSI msg [#574]

    changeset: 5136:46ea86024d6a
    branch: opensaf-4.3.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:33:13 2014 +0200
    summary: avsv: include su_failover in REG_SU msg [#574]

    changeset: 5137:70fee6ec323f
    branch: opensaf-4.4.x
    parent: 5131:6a2171548ea4
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:36:39 2014 +0200
    summary: amf: include and use sirank in SUSI msg [#574]

    changeset: 5138:8eb969f21971
    branch: opensaf-4.4.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:36:39 2014 +0200
    summary: amf: include and use comp capability in SUSI msg [#574]

    changeset: 5139:b7a8df1a58d6
    branch: opensaf-4.4.x
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:36:40 2014 +0200
    summary: amf: include and use su_failover in REG_SU msg [#574]

    changeset: 5140:f90ee633a89f
    parent: 5133:a2360a43963f
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:36:54 2014 +0200
    summary: amf: include and use sirank in SUSI msg [#574]

    changeset: 5141:d6e7d85efb8b
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:36:54 2014 +0200
    summary: amf: include and use comp capability in SUSI msg [#574]

    changeset: 5142:8dcb7f6a6762
    tag: tip
    user: Hans Feldt hans.feldt@ericsson.com
    date: Thu Apr 10 13:36:55 2014 +0200
    summary: amf: include and use su_failover in REG_SU msg [#574]

     

    Related

    Tickets: #574


Log in to post a comment.