OpenSAF / Tickets / #3219 imm: immnd crash in multi partitioned clusters rejoin

imm: immnd crash in multi partitioned clusters rejoin

#3219 imm: immnd crash in multi partitioned clusters rejoin

Milestone: 5.20.11

Status: fixed

Owner: Thuan Tran

Labels: None

Type: defect

Component: imm

Part: nd

Version:

Priority: minor

Blocker: False

Updated: 2020-09-18

Created: 2020-09-16

Creator: Thuan Tran

Private: No

Under scenario multi partitioned cluster rejoin, IMMND crash as following:

2020-09-15 18:39:59.310 SC-6 osafimmnd[195]: NO Re-introduce-me highestProcessed:4358 highestReceived:4358 ex_immd_node_id=2070f
2020-09-15 18:39:59.310 SC-6 osafimmnd[195]: src/imm/immnd/immnd_evt.c:10158: immnd_evt_proc_pbe_prto_purge_mutations: Assertion 'cb->mRulingEpoch <= evt->info.ctrl.rulingEpoch' failed.

Then node don't reboot as expected even it used to on different partition with current coordinator.

Thuan Tran - 2020-09-16

status: assigned --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thuan Tran - 2020-09-18

status: review --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

commit d2bbaba8c28f68f5cb1a5e620022d673cc03600e
Author: thuan.tran thuan.tran@dektech.com.au
Date: Wed Sep 16 14:18:33 2020 +0700

imm: fix immnd crash in multi partitioned clusters rejoin [#3219]

Before IMMND get re-intro response from IMMD, it may get broadcast
event from IMMD (e.g: IMMND_EVT_D2ND_PBE_PRTO_PURGE_MUTATIONS) then
crash because it used to be on different partition. As IMMND crash
then it cannot reboot node as expected. Solution:
- IMMND prioritize re-introduce response msg from IMMD.
- IMMND ignore broadcast events from IMMD if re-introduce on-going.

imm: immnd crash in multi partitioned clusters rejoin

Milestone

Searches

Help

#3219 imm: immnd crash in multi partitioned clusters rejoin

Related

Discussion