Menu

#2456 mds: NCSMDS_DOWN is sent with adest

future
unassigned
nobody
None
defect
mds
-
major
False
2019-01-09
2017-05-09
No
  • Description:
    When SCs are rebooted repeatedly, there is a case that NCSMDS_DOWN is sent to PLs with only adest. This causes amfnd can not detect both SCs down to get into headless state.

As observed in logs:
At 10:48:00, SC1 was off , SC2 was still alive
2017-03-09 10:47:58 SC-1 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="273" x-info="http://www.rsyslog.com"] exiting on signal 15.
2017-03-09 10:48:06 SC-1 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="268" x-info="http://www.rsyslog.com"] start

SC2 was going down at 10:48:02
2017-03-09 10:48:02 SC-2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="239" x-info="http://www.rsyslog.com"] exiting on signal 15.
2017-03-09 10:48:03 SC-2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="270" x-info="http://www.rsyslog.com"] start

So after 10:48:04, the cluster actually went into headless. But NCSMDS_DOWN events received at amfnd was ADEST, so it was ignored. That resulted in @is_avd_down still FALSE

Mar 9 10:48:00.930820 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:48:00.930826 osafamfnd [422:src/amf/amfnd/di.cc:0609] << avnd_evt_mds_avd_dn_evh
...
Mar 9 10:48:04.517247 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:48:04.517254 osafamfnd [422:src/amf/amfnd/di.cc:0609] << avnd_evt_mds_avd_dn_evh

The next SC restart, amfnd received NCSMDS_NEW_ACTIVE.
Mar 9 10:48:10.187820 osafamfnd [422:src/amf/amfnd/mds.cc:0540] NO AVD NEW_ACTIVE, adest:1
But with @is_avd_down as FALSE, amfnd did not send sync state info because amfnd thought that the cluster were NOT going into headless.

Compare to previous SC restart in the same test, one of NCSMDS_DOWN events must be vdest

Mar 9 10:47:50.565629 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:47:50.565685 osafamfnd [422:src/amf/amfnd/di.cc:0617] WA AMF director unexpectedly crashed
...
Mar 9 10:47:50.565874 osafamfnd [422:src/amf/amfnd/di.cc:0602] >> avnd_evt_mds_avd_dn_evh
Mar 9 10:47:50.565879 osafamfnd [422:src/amf/amfnd/di.cc:0609] << avnd_evt_mds_avd_dn_evh

1 Attachments

Discussion

  • Anders Widell

    Anders Widell - 2017-07-01
    • Milestone: 5.17.06 --> 5.17.08
     
  • Anders Widell

    Anders Widell - 2017-07-28
    • Milestone: 5.17.07 --> 5.17.10
     
  • Gary Lee

    Gary Lee - 2017-07-31

    Seen again. I think normally there should be 2 x adest down and 1 x vdest down.

     
  • Anders Widell

    Anders Widell - 2017-11-03
    • Milestone: 5.17.11 --> 5.18.01
     
  • Anders Widell

    Anders Widell - 2018-02-02
    • Milestone: 5.18.01 --> 5.18.04
     
  • Gary Lee

    Gary Lee - 2018-09-29
    • Priority: minor --> major
    • Milestone: 5.18.04 --> 5.18.12
     
  • Gary Lee

    Gary Lee - 2019-01-09
    • Milestone: 5.19.01 --> future
     

Log in to post a comment.