Menu

#2296 imm: IMMND on payload crashes after SC absence

5.0.2
fixed
None
defect
imm
nd
major
2017-02-21
2017-02-09
Hung Nguyen
No

Removal of IMMND coordinator was introduced in [#1692].
Some cleanup actions are delayed until immnd_proc_server() is executed.

In case the cluster is back from headless too fast, immnd_proc_server() will not be executed and IMMND will crashes later.

2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO Announce sync, epoch:28
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER
2017-02-05 21:36:41 PL-5 osafimmnd[406]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
2017-02-05 21:36:41 PL-5 osafimmloadd: NO Sync starting
2017-02-05 21:36:42 PL-5 osafdtmd[393]: NO Lost contact with 'SC-1'
2017-02-05 21:36:42 PL-5 osafimmnd[406]: WA Director Service in NOACTIVE state - fevs replies pending:16 fevs highest processed:13154
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA SC Absence IS allowed:900 IMMD service is DOWN
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:290002050f sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:14d0002050f sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: WA Postponing hard delete of admin owner with id:41 when imm is not writable state
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1530002050f sv_id:27
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 147 <339, 2050f> (OpenSafImmPBE)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Removing client id:1550002050f sv_id:26
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 144 <0, 2010f(down)> (safLogService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 145 <0, 2010f(down)> (@safLogService_appl)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 146 <0, 2010f(down)> (@OpenSafImmReplicatorA)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 143 <0, 2010f(down)> (safClmService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Implementer disconnected 142 <0, 2010f(down)> (safAmfService)
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO Impl Discarded node 2010f
2017-02-05 21:36:43 PL-5 osafimmnd[406]: NO MDS unregisterede. sleeping ...
2017-02-05 21:36:43 PL-5 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO Sleep done registering IMMND with MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO SUCCESS IN REGISTERING IMMND WITH MDS
2017-02-05 21:36:44 PL-5 osafimmnd[406]: NO MDS: mds_register_callback: dest 2050f000001e8 already exist
2017-02-05 21:36:44 PL-5 osafimmnd[406]: WA IMMND - Client Node Get Failed for cli_hdl:1464583980303
2017-02-05 21:36:45 PL-5 osafdtmd[393]: NO Established contact with 'SC-1'
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA MDS Send Failed
2017-02-05 21:36:49 PL-5 osafimmnd[406]: WA Error code 2 returned for message type 17 - ignoring
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO IMMD service is UP ... ScAbsenseAllowed?:900 introduced?:2
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Epoch set to 29 in ImmModel
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13154 highestReceived:13154
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO ERR_BAD_HANDLE: admin owner id 42 does not exist
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Implementer connected: 149 (OpenSafImmPBE) <0, 2040f>
2017-02-05 21:36:49 PL-5 osafimmnd[406]: NO Re-introduce-me highestProcessed:13157 highestReceived:13158
2017-02-05 21:36:49 PL-5 osafimmnd[406]: ER Node is in a state that cannot accept start of sync, will terminate

IMMND failed to revert back to IMM_SERVER_READY/IMM_NODE_FULLY_AVAILABLE and crashed.

#0  0x00007f23733bdc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 406
        selftid = 406
#1  0x00007f23733c1028 in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x152d00000009, sa_sigaction = 0x152d00000009}, sa_mask = {__val = {93865551367896, 30, 54, 139790248362720, 139790245522487, 17179869186, 139790248362720, 140726076478512, 0, 139790250985925, 54, 30, 54, 140726076478560, 139790245475049, 140726076478560}}, sa_flags = 0, sa_restorer = 0x2c774d2a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x0000555ec6cac677 in ImmModel::prepareForSync (this=0x555ec774db30, isJoining=false) at src/imm/immnd/ImmModel.cc:2637
        __FUNCTION__ = "prepareForSync"
#3  0x0000555ec6caa696 in immModel_prepareForSync (cb=0x555ec6ff8a60 <_immnd_cb>, isJoining=false) at src/imm/immnd/ImmModel.cc:2193
No locals.
#4  0x0000555ec6c8373e in immnd_evt_proc_start_sync (cb=0x555ec6ff8a60 <_immnd_cb>, evt=0x7f236c002990, sinfo=0x7f236c002ad0) at src/imm/immnd/immnd_evt.c:8739
        __FUNCTION__ = "immnd_evt_proc_start_sync"
#5  0x0000555ec6c61d01 in immnd_process_evt () at src/imm/immnd/immnd_evt.c:666
        cb = 0x555ec6ff8a60 <_immnd_cb>
        rc = 1
        evt = 0x7f236c002980
        __FUNCTION__ = "immnd_process_evt"
#6  0x0000555ec6c8cc1c in main (argc=1, argv=0x7ffd57cc9698) at src/imm/immnd/immnd_main.c:369
        wasCoord = 0 '\000'
        now = {tv_sec = 897603, tv_nsec = 56765584}
        passed_time = {tv_sec = 7, tv_nsec = 104432087}
        passed_time_ms = 7104
        ret = 1
        mbx_fd = {raise_obj = 12, rmv_obj = 13}
        error = SA_AIS_OK
        timeout = 100
        eventCount = 13
        maxEvt = 50
        start_time = {tv_sec = 897595, tv_nsec = 952333497}
        fds = {{fd = 17, events = 1, revents = 0}, {fd = 15, events = 1, revents = 0}, {fd = 13, events = 1, revents = 1}}
        term_fd = 17
        __FUNCTION__ = "main"
1 Attachments

Related

Tickets: #1692
Tickets: #2296
Tickets: #2420
Wiki: ChangeLog-5.0.2
Wiki: ChangeLog-5.1.1

Discussion

  • Hung Nguyen

    Hung Nguyen - 2017-02-10
    • status: accepted --> review
     
  • Hung Nguyen

    Hung Nguyen - 2017-02-21
    • status: review --> fixed
     

Log in to post a comment.