This issue happen when we test the system in a split-brain scenario. We split the cluster into partitions as follow: [[SC-1(ACT), SC-2(STB), SC-3], [ SC-4(ACT), SC-5(STB), SC-6],[ SC-7, SC-8, SC-9(STB), SC-10(ATC)]] then merge all nodes back. The quiesced SC-3 detected active nodes in other partitions up while the active SC-1 in the same partition was still alive, therefore no service events were raised for the active nodes in other partitions. When the SC-1 was down, one of the other active was notified as new active. After the new active SC went down, other active was notified. Finally, the SC-3 "amfnd" detected two NEW_ACTIVE amfd and rebooted.
Log analysis:
<143>1 2022-05-31T05:34:56.169467+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25315"] >> mds_mcm_svc_up
<143>1 2022-05-31T05:34:56.169469+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25316"] MCM:API: LOCAL SVC INFO : svc_id = AVND(13) | PWE id = 1 | VDEST id = 65535 |
<143>1 2022-05-31T05:34:56.16947+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25317"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2022-05-31T05:34:56.169472+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25318"] >> mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.169474+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25319"] << mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.169476+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25320"] >> mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.169477+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25321"] << mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.169479+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25322"] >> mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.16948+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25323"] << mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.169482+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25324"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.169484+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25325"] MDS:DB: Subscription Result not present
<143>1 2022-05-31T05:34:56.169486+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25326"] << mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.169487+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25327"] >> mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.169489+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25328"] << mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.169491+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25329"] >> mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.169493+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25330"] MDS:DB: adest_details: <rem_node[0x20a0f]:dest_pid[441]>
<143>1 2022-05-31T05:34:56.169494+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25331"] << get_subtn_adest_details
<143>1 2022-05-31T05:34:56.169496+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25332"] MDS:DB: sub_adest_details: <rem_node[0x20a0f]:dest_pid[441]>
<143>1 2022-05-31T05:34:56.169498+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25333"] << mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.169499+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25334"] << mds_mcm_svc_up
...
<143>1 2022-05-31T05:34:56.175867+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25497"] >> mds_mcm_svc_up
<143>1 2022-05-31T05:34:56.175869+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25498"] MCM:API: LOCAL SVC INFO : svc_id = AVND(13) | PWE id = 1 | VDEST id = 65535 |
<143>1 2022-05-31T05:34:56.17587+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25499"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2022-05-31T05:34:56.175872+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25500"] >> mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.175874+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25501"] << mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.175875+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25502"] >> mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.175877+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25503"] << mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.175879+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25504"] >> mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.175881+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25505"] << mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.175882+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25506"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.175885+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25507"] MDS:DB: Subscription Result not present
<143>1 2022-05-31T05:34:56.175887+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25508"] << mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.175888+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25509"] >> mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.17589+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25510"] << mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.175891+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25511"] >> mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.175893+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25512"] MDS:DB: adest_details: <rem_node[0x2040f]:dest_pid[441]>
<143>1 2022-05-31T05:34:56.175895+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25513"] << get_subtn_adest_details
<143>1 2022-05-31T05:34:56.175897+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25514"] MDS:DB: sub_adest_details: <rem_node[0x2040f]:dest_pid[441]>
<143>1 2022-05-31T05:34:56.175898+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25515"] << mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.1759+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25516"] << mds_mcm_svc_up
2022-05-31T05:34:56.214424+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25653"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_DOWN for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x2010f]:dest_pid[466]>, rem_svc_pvt_ver=7
...
2022-05-31T05:34:56.21448+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25662"] MCM:API: svc_up : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_NEW_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x2040f]:dest_pid[441]>, rem_svc_pvt_ver=7
2022-05-31T05:34:56.214606+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25731"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_DOWN for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x2040f]:dest_pid[441]>, rem_svc_pvt_ver=7
...
2022-05-31T05:34:56.214626+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25740"] MCM:API: svc_up : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_NEW_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x20a0f]:dest_pid[441]>, rem_svc_pvt_ver=7
2022-05-23 14:41:16.878 SC-3 osafamfnd[454]: Rebooting OpenSAF NodeId = 2030f EE Name = , Reason: AVD already up, OwnNodeId = 2030f, SupervisionTime = 60
2022-05-23 14:41:16.890 SC-3 opensaf_reboot: Rebooting local node; timeout=60
In normal case, the up flag will be reset by the mds thread whenever the amfnd receives a NEW_ACTIVE event:
That flag is also set by the amfnd main thread when amfnd process the AVND_EVT_MDS_AVD_UP event. The up flag isn't protected so it is not thread-safe.
This was what happen when the issue occurred:
And this is in normal case:
Hi Hieu,
Just to reproduce this ticket, can you please let me know how are you splitting the network into separate partitions and then how are you able to merge the networks. I assume, you are doing it on virtual machines?
Thanks
Hi Mohan Kanakam,
I'm using "lxc" to run the cluster and tool "iptables" to split/merge network. I reproduce this ticket as follow:
(ACT: active, STB: standby, QSC: quiesced)
Another important condition is two NEW_ACTIVE events must be processed in nearly same time. However amfnd processes a NEW_ACTIVE event very fast. It's hard to reproduce it, therefore I changed the source code to slow down amfnd. The detail of it is in attached file.
Best regards,
Hieu
Last edit: Hieu Hong Hoang 2022-07-26
Hi Hieu,
Thanks for the steps to reproduce issue.
Hi Hieu,
I tested with 2 SCs[SC-1, SC-2] and 2PLs. When I created 2 partitions [SC-1 Active] and [SC-2(Act) and 2 payloads]. When I merged the partitions, then both SC-1 and SC-2 rebooted because of split-brain and then both payloads rebooted because there is no controller.
1. Is it an expected bahaviour ? Or
2. Only one SC(either SC-1 or SC-2) should reboot and the payloads should be running with that SC.
Thanks
Hi Mohan,
Yes, it is an expected behavior. The active SCs rebooted because the active SC saw other active SC in a cluster. PLs rebooted because it lost connection with both active SC and standby SC.
To enhance, OpenSAF introduced the SC absence feature documented in "src/imm/README.SC_ABSENCE". If the SC absence feature is enabled, the PL will not reboot immediately. After SCs rebooted and one of them became active,
all PLs which is not in the same partition with the new active SC will be rebooted.
For example:
-- We split cluster into 2 partitions [SC-1, PL-1] [SC-2, PL-2, PL-3].
-- After merging the partitions, SC-1 and SC-2 reboot.
-- SC-1 becomes an active SC and SC-2 becomes a standby SC.
-- SC-1 requires PL-2 and PL-3 to reboot. The PL-1 will survive after merging network.
We are not support this, but it is a promising approach. To implement it, we need a strategy to choose a SC among the active SCs, reassign sus in PLs, synchronize IMM data in PLs etc... The survived SC must verify information of all unknown PLs and resolve conflicts between PLs. However, I think it's easier if we restart PLs.
Best regards,
Hieu
Hi Hieu,
Awesome, thanks for your response, appreciate it.
I run the following test case with your patch of #3317. We had enabled SC_ABSENCE feature in both the partition i.e. at SC-1 and SC-2's immd.conf.
1. [SC-1, PL-3] in one cluster with SC-1 Active.
2. [SC-2, PL-4] in another cluster with SC-2 Active.
3. Then I merged both the partitions. So, SC-1 and SC-2 detect each other and report split brain and both SC-1 and SC-2 reboots. PL-3 and PL-4 keeps running because of SC_ABSENCE is enabled.
4. SC-1 comes back as Active and both the payloads PL-3 and PL-4 detects SC-1 and syncs up with SC-1 and no payload reboots. Later on SC-2 comes back as Standby.
Here, as per you, payload PL-4, which was running in another partition, should have rebooted, but it didn't reboot. Can you please suggest if we are doing something wrong?
Thanks
-Mohan
Hi Mohan,
Please check the variable "IMMSV_COORD_SELECT_NODE" in the file "immd.conf". It should be enabled to check the partition of a node. That code is in the file "src/imm/immd/immd_evt.c":
Best regards,
Hieu
Hello Hieu,
Do you have any documentation on split brain and the expected results with some sample use cases? Can I find it in PR documents?
If you have any document, please share or if you can please write it a small 1 page or 2 pages document and share it with me. I will help you in testing the upcoming patches.
Thanks
Paul
Hi Paul,
OpenSAF have several ways to handle the split-brain. They were mentioned in the following documents:
However, there's no user case in those documents. Because there are a lot of user cases related to those features, please let me know if you are interesting to any of them.
Best regards,
Hieu
commit 629f41983434332c732bca7f11362e5f5942a96e (HEAD -> develop, origin/develop, ticket-3317)
Author: hieu.h.hoang hieu.h.hoang@dektech.com.au
Date: Tue Jun 14 08:51:33 2022 +0700