Sometimes, the NCSMDS_UP event comes after the messages.
In this case, IMMD received the IMMD_EVT_ND2D_INTRO message before the NCSMDS_UP event.
IMMD failed to process the intro message because the node info had not been added to cb->immnd_tree.
Aug 12 08:13:53 SC-1 osafimmd[11184]: WA Node not found 566314186398634 Aug 12 08:13:53 SC-1 osafimmd[11184]: WA Error returned from processing message err:2 msg-type:2 Aug 12 08:13:53 SC-1 osafimmnd[11199]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Aug 12 08:13:53 SC-1 osafimmd[11184]: NO New IMMND process is on ACTIVE Controller at 2010f Aug 12 08:13:53 SC-1 osafimmd[11184]: NO Extended intro from node 2010f Aug 12 08:13:53 SC-1 osafimmd[11184]: NO First SC IMMND (OpenSAF 4.4 or later) attached 2010f Aug 12 08:13:53 SC-1 osafimmd[11184]: NO Attached Nodes:2 Accepted nodes:1 KnownVeteran:0 doReply:1 Aug 12 08:13:53 SC-1 osafimmd[11184]: NO First IMMND on SC found at 2010f this IMMD at 2010f. Cluster is loading, *not* 2PBE => designating that IMMND as coordinator Aug 12 08:13:53 SC-1 osafimmnd[11199]: NO This IMMND is now the NEW Coord
IMMND on SC-1 was elected as coordinator insted of the veteran.
The MDS messages come from 'Dsock' socket and MDS events come from 'BSRsock'.
Since MDS uses two different sockets so I think we can't fix this problem in MDS.
IMM has to somehow handle this case.
System 10 nodes (with HEADLESS feature) with TIPC version is 2.0
-I s in your case issue observed headless case ?
-Based above bug description it looks a simple PL is joining cluster
SC with coordinator is in stable state , just because of out of order events of control (NCSMDS_UP) and normal messages (IMMD_EVT_ND2D_INTRO)
the existing Active IMMD failed to process the intro message because the node info had not been added to cb->immnd_tree.
-If so the below analysis is not matching with above bug description
You have existing stable coordinator, why immsv will elect new coordinator ?
As for a MDS is single thread , unless TIPC send control event ( high priority in TIPC)
compare to normal message (IMMD_EVT_ND2D_INTRO) this is out of MDS control.
Dose IMM process process all MDS events & messages kept & processed in a mail box , please confirm
-AVM
After head less , if SC is starting ,
I can see some code in IMMD Waiting 3 seconds to allow IMMND MDS attachments to get processed.
============================================================================
if (cb->mScAbsenceAllowed && cb->ha_state == SA_AMF_HA_ACTIVE) {
/ If this IMMD has active role, wait for veteran payloads.
* Give up after 3 seconds if there's no veteran payloads. /
LOG_NO("Waiting 3 seconds to allow IMMND MDS attachments to get processed.");
}
/===========================================================================
Ideally this should be timer insted of poll like AMFD node_sync_tmr
============================================================================
if (rc_node_up == sync_nd_size) {
if (cb->node_sync_tmr.is_active) {
avd_stop_tmr(cb, &cb->node_sync_tmr);
TRACE("stop NodeSync timer");
}
cb->all_nodes_synced = true;
LOG_NO("Received node_up_msg from all nodes");
} else {
if (avnd->node_up_msg_count == 1 &&
(act_nd || n2d_msg->msg_info.n2d_node_up.leds_set)) {
\===========================================================================
if (cb->mScAbsenceAllowed &&
pEvt->info.immd.type == IMMD_EVT_ND2D_INTRO &&
pEvt->info.immd.info.ctrl_msg.refresh == 2) {
When the payload IMMND, sent intro message, the priority of this message is increased to NCS_IPC_PRIORITY_HIGH. The service event that should have been arrived before is also waiting in the mailbox with NCS_IPC_PRIORITY_VERY_HIGH. Since, both the INTRO message and service event are now of high priority, there is a chance that message is processed before service event.
The service event is PRIORITY_VERY_HIGH and the intro message is PRIORITY_HIGH.
So service event should always be processed before the intro message.
I think the problem comes from MDS (or TIPC), not from the way IMMD puts messages to the mailbox.
Then problem can be isolate by keeping priority NCS_IPC_PRIORITY_NORMAL for IMMD_EVT_ND2D_INTRO event ,problem should be reproducible even after this change.
Can you please elaborate why special priority for IMMD_EVT_ND2D_INTRO event.
We don't set PRIORITY_HIGH for all intro messages, that priority is set for intro messages from veteran nodes only.
Since there's no way to distinguish "cluster start" and "back from headless" so IMM has to wait 3 seconds to receive all the intro messages (from SC-based IMMNDs (PRIO_NORMAL) and veteran IMMNDs (PRIO-HIGH)).
That way IMMD will always process the intro message from veterans first.
Since the service event is set to PRIORITY_VERY_HIGH so there will be no difference between using PRIORITY_HIGH or PRIORITY_NORMAL for intro messages.
Exactly , we are getting the issue with intro messages from veteran nodes only , on the same setup for new payload joining issue is not reproducible , if issue is with MDS/TIPC , you will see the same problem at least on the same setup on both veteran nodes & new node.
Isolate the problem by keeping priority NCS_IPC_PRIORITY_NORMAL for IMMD_EVT_ND2D_INTRO event.
TIPC delivers topology events and data messages asynchronously through two separate channels, so you can't assume any specific ordering between topology events (e.g. service UP), and data messages.
Then we will fix this in IMM.
default (5.2) [staging:e8d47b]
changeset: 7996:e8d47b7395b3
user: Hung Nguyen hung.d.nguyen@dektech.com.au
date: Sun Aug 28 12:36:37 2016 +0700
summary: imm: Create missing IMMND node when processing intro messages [#1955]
opensaf-5.1.x [staging:a961d4]
changeset: 7997:a961d435de91
user: Hung Nguyen hung.d.nguyen@dektech.com.au
date: Sun Aug 28 12:36:37 2016 +0700
summary: imm: Create missing IMMND node when processing intro messages [#1955]
opensaf-5.0.x [staging:db6126]
changeset: 7998:db61263a1e92
user: Hung Nguyen hung.d.nguyen@dektech.com.au
date: Sun Aug 28 12:36:37 2016 +0700
summary: imm: Create missing IMMND node when processing intro messages [#1955]
Related
Commit: [a961d4]
Commit: [db6126]
Commit: [e8d47b]
Tickets:
#1955Diff: