Stop both SCs so that cluster goes into headless. Trigger a su failover, so su_oper message is buffered and supposedly will be sent to active amfd when SC comes back. However, if cluster is waiting up to 3 mins, which is exactly the MDS_AWAIT_ACTIVE_TMR_VAL timeout, amfnd will receive another NCSMDS_DOWN. At this time, amfnd will delete all pending messages, which causes the headless recovery impossible.
Some outline logs:
Apr 18 16:49:09.749428 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh Apr 18 16:49:09.750094 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director unexpectedly crashed Apr 18 16:49:09.750103 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending messages to be sent to AMFD Apr 18 16:49:09.796138 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send() deferred as AMF director is offline(1), or sync is required(1) Apr 18 16:49:09.797440 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send() deferred as AMF director is offline(1), or sync is required(1) Apr 18 16:52:09.825457 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh Apr 18 16:52:09.825489 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director unexpectedly crashed Apr 18 16:52:09.825495 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending messages to be sent to AMFD Apr 18 16:52:09.825498 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del Apr 18 16:52:09.825505 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del Apr 18 16:52:09.825508 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del Apr 18 16:52:09.825512 osafamfnd [10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del
develop:
commit 4cb4351920a16284ac3dfb40f055bab455e760dc
Author: Minh Chau minh.chau@dektech.com.au
Date: Wed Apr 26 15:02:48 2017 +1000
release:
commit ee0ae69f29bfd3672a4bfa3a55154d07948962ea
Author: Minh Chau minh.chau@dektech.com.au
Date: Wed Apr 26 15:02:48 2017 +1000
changeset: 8790:c95a64cc4940
user: Minh Hon Chau minh.chau@dektech.com.au
date: Thu May 04 15:05:26 2017 +1000
summary: amfnd: Ignore second NCSMDS_DOWN [#2436]
Related
Tickets:
#2436