Menu

#1698 imm: IMMD process the intro msg from newly-joined IMMND before veterans

5.0.FC
fixed
None
defect
imm
d
major
2016-03-22
2016-03-09
Hung Nguyen
No

During headless, there's an IMMND joins the cluster.
When IMMD is back, the IMMNDs (newly-joined and veteran) will send intro messages (ND2D_INTRO) to the active IMMD.

If the intro msg from newly-joined IMMND reaches the IMMD before the veterans,
IMMD will order the newly-joined IMMND to load (LOADING_CLIENT) instead of sync (SYNC_CLIENT).

Mar  9 17:04:39 SC-1 osafimmd[1029]: NO Extended intro from node 2040f
Mar  9 17:04:39 SC-1 osafimmd[1029]: NO Payload node 2040f introduced before first SC, can not yet verify File/Directory base matches SC.

When IMMD processes the intro message from a veteran, it will set that IMMND as coordinator.
And then when the new IMMND receives the sync start message (D2ND_SYNC_START), it will crash (abort).

Mar  9 17:04:39 SC-1 osafimmd[1029]: NO Sc Absence Allowed is configured (1800) => IMMND coord at payload node:2030f dest566313288150651
Mar  9 17:04:39 SC-1 osafimmd[1029]: NO Node 2010f request sync sync-pid:1039 epoch:0 
Mar  9 17:04:40 SC-1 osafimmd[1029]: NO Successfully announced sync. New ruling epoch:3

Mar  9 17:04:39 PL-4 osafimmnd[400]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Mar  9 17:04:39 PL-4 osafimmnd[400]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_LOADING_CLIENT
Mar  9 17:04:40 PL-4 osafimmnd[400]: WA Imm at this node has epoch 0, appears to be a stragler in wrong state 5


The ways that newly-joined IMMND and veteran IMMND send intro messages are different.
- veteran IMMND: when receiving NCSMDS_UP event from IMMD
- newly-joined IMMND: in immnd_proc_server()

So normally, veteran will send the intro message faster than the newly-joined IMMND (veterans send right after receiving NCSMDS_UP event).
That's why it's hard to reproduce this problem.

I have to put some sleep to the veterans to reproduce.

@@ -10191,6 +10191,7 @@ static uint32_t immnd_evt_proc_mds_evt(I
        } else if ((evt->info.mds_info.change == NCSMDS_UP) && (evt->info.mds_info.svc_id == NCSMDS_SVC_ID_IMMD)) {
                LOG_NO("IMMD service is UP ... ScAbsenseAllowed?:%u introduced?:%u",
                           cb->mScAbsenceAllowed, cb->mIntroduced);
+               if(cb->mIntroduced == 2) usleep(100000);
                if((cb->mIntroduced==2) && (immnd_introduceMe(cb) != NCSCC_RC_SUCCESS)) {
                        LOG_WA("IMMND re-introduceMe after IMMD restart failed, will retry");
                }
1 Attachments

Related

Tickets: #1698
Tickets: #1896

Discussion

  • Hung Nguyen

    Hung Nguyen - 2016-03-10
    • status: accepted --> review
     
  • Hung Nguyen

    Hung Nguyen - 2016-03-22
    • status: review --> fixed
     
  • Hung Nguyen

    Hung Nguyen - 2016-03-22

    default 5.0

    changeset: 7344:7ecbd2a7b7a7 [7ecbd2]
    tag: tip
    user: Hung Nguyen hung.d.nguyen@dektech.com.au
    date: Tue Mar 22 17:22:35 2016 +0700
    summary: imm: Wait for veterans when IMMD starts [#1698]

     

    Related

    Tickets: #1698


Log in to post a comment.