Menu

#395 amfnd deadlocks with immnd

never
duplicate
None
defect
amf
-
4.2.0
major
2015-07-14
2013-05-31
No

Migrated from http://devel.opensaf.org/ticket/2601

Have seen a system crash where amfnd is trying to read IMM and immnd is trying to register with AMF.

http://devel.opensaf.org/ticket/1713 exist to improve things on the IMM side. This ticket should address and reduce the risk on the amfnd side.

There will always be a risk if immnd crashes that it will lead to a system crash with the current design. But in the normal case the deadlock should be avoided by design.

The core dump from the crash below shows that amfnd is trying to read component related info in the context of an API response. This information (component capability) can instead be read when the component is initialized.

I also realized there is a slight change in the protocol between amfd and amfnd that was not intentional and probably reduces the risk. amfd is immediately sending instantiate request without waiting for the REGSU response.

0 0x00007f35bc93efd3 in select () from /lib64/libc.so.6 #0 0x00007f35bc93efd3 in select () from /lib64/libc.so.6

1 0x00007f35bdae6b89 in ncs_sel_obj_select (highest_sel_obj=<optimized out="">, rfds=0x7fff428fa1e0, wfds=0x0, efds=0x0, timeout_in_10ms=0x7fff428fa26c) at os_defs.c:2622</optimized>

2 0x00007f35bd67111a in imma_sync_with_immnd (cb=<optimized out="">) at imma_init.c:79</optimized>

3 imma_create (sv_id=<optimized out="">) at imma_init.c:165</optimized>

4 imma_startup (sv_id=NCSMDS_SVC_ID_IMMA_OM) at imma_init.c:278

5 0x00007f35bd66d94d in initialize_common (immHandle=0x7fff428fa7c0, cl_node=0x695200, version=0x7fff428fa5e0) at imma_om_api.c:194

6 0x00007f35bd66e0b3 in saImmOmInitialize (immHandle=0x7fff428fa7c0, immCallbacks=0x0, inout_version=<optimized out="">) at imma_om_api.c:177</optimized>

7 0x00000000004067fd in immutil_saImmOmInitialize (immHandle=0x7fff428fa7c0, immCallbacks=0x0, version=0x7fff428fa7e0) at ../../../../../osaf/tools/safimm/src/immutil.c:1051

8 0x000000000043760d in avnd_imm_init (immHandle=0x7fff428fa7c0, immVersion=0x7fff428fa7e0) at avnd_util.c:199

9 0x000000000041df25 in avnd_comp_cap_x_act_or_1_act_check (comp_type=0x68cbca, csi_type=0x6999d2) at avnd_comp.c:1105 #10 0x000000000041e3cb in avnd_comp_csi_assign (cb=0x657960, comp=0x68ca90, csi=0x6998a0) at avnd_comp.c:1210

11 0x000000000041e650 in assign_all_csis_at_rank (si=<optimized out="">, rank=1, single_csi=true) at avnd_comp.c:1632</optimized>

12 0x000000000041e7f0 in avnd_comp_csi_assign_done (cb=0x657960, comp=0x68f1c0, csi=0x6720e0) at avnd_comp.c:1751

13 0x000000000040abca in avnd_evt_ava_resp_evh (cb=0x657960, evt=<optimized out="">) at avnd_cbq.c:440</optimized>

14 0x000000000042fdb0 in avnd_evt_process (evt=<optimized out="">) at avnd_proc.c:279</optimized>

15 avnd_main_process () at avnd_proc.c:220

16 0x00000000004086b5 in main (argc=1, argv=0x7fff428fac58) at amfnd_main.c:53

Changed 14 months ago by hafe ¶
■owner changed from ravisekhar to hafe
■status changed from new to accepted
Changed 14 months ago by hafe ¶
The protocol change mentioned is between 3.0 and 4.0. In 4.0 the REG_COMP message is not used at all and is dead code. It should be removed in both amfd and amfnd.

When REG_COMP is not used it triggers code to instantiate SUs before they are even registered properly! See the bottom of avd_node_up_evh(), since comp_sent is always false, avd_nd_reg_comp_evt_hdl() is called at this point. Instead the response from REG_SU should be awaited and then SUs should be instantiated. Interesting here is also the error handling when REG_SU fails. Consider that immnd crashes so amfnd cannot read from IMM during the REG_SU handling, should amfnd crash or respond with an error code to amfd? And what should amfd do with the failed REG_SU response?

amfnd reading from IMM needs to be minimized and possibly kept in the handling of REG_SU. Today amfnd is reading from IMM during the handling of an SI assignment. One problem is that the cstype for a CSI is unknown. Another problem is the component capability which is moved (in B.04) into association objects as children to comptype objects. Those objects can be read at REG_SU handling time and put into the comp object. But in order to now the capability for a specific cstype, the cstype needs to be known when the assignment comes.

Can the SI assignment message be extended with cstype information?

Changed 14 months ago by hafe ¶
■patch_waiting changed from no to yes
Changed 13 months ago by hafe ¶
changeset: 3523:b750a1a063cc
branch: opensaf-4.2.x
parent: 3521:ed09cbfa05dd
user: Hans Feldt hans.feldt@…
date: Fri Apr 27 15:41:43 2012 +0200
summary: avsv/avd: instantiate SUs after registration (#2601)

changeset: 3525:aa57d1e2ad6f
tag: tip
user: Hans Feldt hans.feldt@…
date: Fri Apr 27 15:41:43 2012 +0200
summary: avsv/avd: instantiate SUs after registration (#2601)

remote: rev b750a1a063cc1720b9fc8e433d5e9b5f0d1fe5da sent
remote: rev aa57d1e2ad6f6edbab077126e2e0ddad53b56fdc sent

patch 2 does not build, looking into that and will push separately.

Changed 13 months ago by hafe ¶
■milestone changed from 4.2.1 to future_releases
The urgent problem has been solved. In http://devel.opensaf.org/ticket/1713 and this ticket.

Future (after 4.2.1 release) intended work is to:
* update and push "[PATCH 2 of 2] avsv: remove reg_comp code (#2601)"
* read from IMM only in REGSU context.

Discussion

  • Nagendra Kumar

    Nagendra Kumar - 2013-08-30
    • status: unassigned --> duplicate
    • assigned_to: Nagendra Kumar
     
  • Nagendra Kumar

    Nagendra Kumar - 2015-07-14
    • Milestone: future --> never
     

Log in to post a comment.