OpenSAF / Tickets / #2456 mds: NCSMDS

mds: NCSMDS_DOWN is sent with adest

#2456 mds: NCSMDS_DOWN is sent with adest

Milestone: future

Status: unassigned

Owner: nobody

Labels: None

Type: defect

Component: mds

Part: -

Version:

Priority: major

Blocker: False

Updated: 2019-01-09

Created: 2017-05-09

Creator: Long H Buu Nguyen

Private: No

Description:
When SCs are rebooted repeatedly, there is a case that NCSMDS_DOWN is sent to PLs with only adest. This causes amfnd can not detect both SCs down to get into headless state.

As observed in logs:
At 10:48:00, SC1 was off , SC2 was still alive
2017-03-09 10:47:58 SC-1 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="273" x-info="http://www.rsyslog.com"] exiting on signal 15.
2017-03-09 10:48:06 SC-1 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="268" x-info="http://www.rsyslog.com"] start

SC2 was going down at 10:48:02
2017-03-09 10:48:02 SC-2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="239" x-info="http://www.rsyslog.com"] exiting on signal 15.
2017-03-09 10:48:03 SC-2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="270" x-info="http://www.rsyslog.com"] start

So after 10:48:04, the cluster actually went into headless. But NCSMDS_DOWN events received at amfnd was ADEST, so it was ignored. That resulted in @is_avd_down still FALSE

Mar 9 10:48:00.930820 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:48:00.930826 osafamfnd [422:src/amf/amfnd/di.cc:0609] << avnd_evt_mds_avd_dn_evh
...
Mar 9 10:48:04.517247 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:48:04.517254 osafamfnd [422:src/amf/amfnd/di.cc:0609] << avnd_evt_mds_avd_dn_evh

The next SC restart, amfnd received NCSMDS_NEW_ACTIVE.
Mar 9 10:48:10.187820 osafamfnd [422:src/amf/amfnd/mds.cc:0540] NO AVD NEW_ACTIVE, adest:1
But with @is_avd_down as FALSE, amfnd did not send sync state info because amfnd thought that the cluster were NOT going into headless.

Compare to previous SC restart in the same test, one of NCSMDS_DOWN events must be vdest

Mar 9 10:47:50.565629 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:47:50.565685 osafamfnd [422:src/amf/amfnd/di.cc:0617] WA AMF director unexpectedly crashed
...
Mar 9 10:47:50.565874 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:47:50.565879 osafamfnd [422:src/amf/amfnd/di.cc:0609] << avnd_evt_mds_avd_dn_evh

1 Attachments

PL-4.zip

Discussion

Anders Widell - 2017-07-01

Milestone: 5.17.06 --> 5.17.08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anders Widell - 2017-07-28

Milestone: 5.17.07 --> 5.17.10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary Lee - 2017-07-31

Seen again. I think normally there should be 2 x adest down and 1 x vdest down.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anders Widell - 2017-11-03

Milestone: 5.17.11 --> 5.18.01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anders Widell - 2018-02-02

Milestone: 5.18.01 --> 5.18.04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary Lee - 2018-09-29

Priority: minor --> major

Milestone: 5.18.04 --> 5.18.12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary Lee - 2019-01-09

Milestone: 5.19.01 --> future
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mds: NCSMDS_DOWN is sent with adest

Milestone

Searches

Help

#2456 mds: NCSMDS_DOWN is sent with adest

Discussion