OpenSAF / Tickets / #2418 imm: Info of dead IMMND remains in standby IMMD

imm: Info of dead IMMND remains in standby IMMD

#2418 imm: Info of dead IMMND remains in standby IMMD

Milestone: 5.17.07

Status: fixed

Owner: Hung Nguyen

Labels: None

Type: defect

Component: imm

Part: d

Version:

Priority: major

Blocker: False

Updated: 2017-07-27

Created: 2017-04-10

Creator: Hung Nguyen

Private: No

When Standby IMMD is up at the same time with a IMMND exiting, the info of that IMMND might not be removed from immnd_tree of the Standby IMMD.

Details of the problem is explained in the sequence diagram below
sequence diagram

SC-5 was Active, SC-2 was Standby, IMMND on SC-1 was exiting

18:35:03 SC-1 osafimmnd[441]: exiting for shutdown

18:35:03 SC-2 osafrded[413]: NO RDE role set to STANDBY
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:568511936070075)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:567412424442298)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:566312912814523)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:565213401186744)

18:35:03 SC-5 osafimmd[433]: NO MDS event from svc_id 25 (change:4, dest:564113889558969)

Down event for IMMND@SC-1 was received on SC-5 but not on SC-2.

The symptoms:

If the down IMMND is the corrdinator, that results in when that Standby IMMD becomes Active, it fails to elect new coordinator as there's already a coordinator in the immnd_tree.

18:35:11 SC-2 osafimmd[430]: WA IMMND coordinator at 2050f apparently crashed => electing new coord

No more logs about newly elected coordinator were printed out.

When IMMND@SC-1 is up again, it will fail to introduce to IMMD because the IMMD already have IMMND@SC-1 in immnd_tree with a wrong epoch.

18:35:29 SC-1 osafimmnd[441]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
18:35:29 SC-1 osafimmnd[441]: NO This IMMND is now the NEW Coord
18:35:29 SC-1 osafimmnd[441]: ER 3 > 0, exiting

1 Attachments

log.tgz

Zoran Milinkovic - 2017-04-13

status: accepted --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anders Bjornerstedt - 2017-04-13

I the defect only occurs in a headless system, then I think the ticket slogan, or at least the description sholud say so.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hung Nguyen - 2017-04-25

Blocker: --> False

Milestone: 5.0.2 --> 5.17.06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

5.17.08 (develop) [code:85c90b]

commit 85c90b4abead8bd66e1f20be3f84255645880597
Author: Hung Nguyen <hung.d.nguyen@dektech.com.au>
Date:   Tue Apr 25 13:24:29 2017 +0700

    imm: Ignore the sync'ed IMMND nodes that are not up [#2418]

5.17.06 (release) [code:c1a37f]

commit c1a37fb5032c0e63165bc36e79d5a79be3fd19dd
Author: Hung Nguyen <hung.d.nguyen@dektech.com.au>
Date:   Tue Apr 25 13:24:29 2017 +0700

    imm: Ignore the sync'ed IMMND nodes that are not up [#2418]

default (mercurial) [staging:dc6067]

changeset:   8777:dc60670bfd3b
user:        Hung Nguyen <hung.d.nguyen@dektech.com.au>
date:        Tue Apr 25 13:40:04 2017 +0700
summary:     imm: Ignore the sync'ed IMMND nodes that are not up [#2418]

Commit: [dc6067]
Tickets: ~~#2418~~
Commit: [85c90b]
Commit: [c1a37f]

Hung Nguyen - 2017-04-25

status: review --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hung Nguyen - 2017-05-17

status: fixed --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hung Nguyen - 2017-05-17

Re-open this ticket since the new active IMMD (switches from STANDBY role) has problem with dead IMMND in the immnd_tree. The dead IMMND should be cleanup before switching to ACTIVE.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hung Nguyen - 2017-05-26

status: review --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

5.17.08 (develop) [code:ff044b]

commit ff044b93c3182997cbe9ab318245846c876ecd02
Author: Hung Nguyen <hung.d.nguyen@dektech.com.au>
Date:   Mon May 15 14:09:06 2017 +0700

    imm: Clear dead IMMND info before switching to ACTIVE role [#2418]

    During cold-sync, standby IMMD may receive info of dead IMMND.
    Before switching to active, the IMMD should clear those dead IMMND info.

5.17.06 (release) [code:b6d724]

commit b6d724a849988ef91dcfad4c0267df7a8ea96e4b
Author: Hung Nguyen <hung.d.nguyen@dektech.com.au>
Date:   Mon May 15 14:09:06 2017 +0700

    imm: Clear dead IMMND info before switching to ACTIVE role [#2418]

    During cold-sync, standby IMMD may receive info of dead IMMND.
    Before switching to active, the IMMD should clear those dead IMMND info.

Commit: [b6d724]
Commit: [ff044b]

Anders Widell - 2017-07-01

Milestone: 5.17.06 --> 5.17.08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

imm: Info of dead IMMND remains in standby IMMD

Milestone

Searches

Help

#2418 imm: Info of dead IMMND remains in standby IMMD

Related

Discussion

Related

Related