Menu

#3317 amfnd: two NEW_ACTIVE amfd in split-brain scenario

5.22.11
fixed
None
defect
amf
nd
minor
False
2022-08-15
2022-06-10
No

This issue happen when we test the system in a split-brain scenario. We split the cluster into partitions as follow: [[SC-1(ACT), SC-2(STB), SC-3], [ SC-4(ACT), SC-5(STB), SC-6],[ SC-7, SC-8, SC-9(STB), SC-10(ATC)]] then merge all nodes back. The quiesced SC-3 detected active nodes in other partitions up while the active SC-1 in the same partition was still alive, therefore no service events were raised for the active nodes in other partitions. When the SC-1 was down, one of the other active was notified as new active. After the new active SC went down, other active was notified. Finally, the SC-3 "amfnd" detected two NEW_ACTIVE amfd and rebooted.

Log analysis:

  • SC-3 detected active amfd in other partitions up:
<143>1 2022-05-31T05:34:56.169467+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25315"] >> mds_mcm_svc_up
<143>1 2022-05-31T05:34:56.169469+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25316"] MCM:API: LOCAL SVC INFO  : svc_id = AVND(13) | PWE id = 1 | VDEST id = 65535 |
<143>1 2022-05-31T05:34:56.16947+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25317"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2022-05-31T05:34:56.169472+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25318"] >> mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.169474+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25319"] << mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.169476+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25320"] >> mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.169477+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25321"] << mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.169479+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25322"] >> mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.16948+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25323"] << mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.169482+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25324"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.169484+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25325"] MDS:DB: Subscription Result not present
<143>1 2022-05-31T05:34:56.169486+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25326"] << mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.169487+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25327"] >> mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.169489+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25328"] << mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.169491+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25329"] >> mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.169493+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25330"] MDS:DB: adest_details: <rem_node[0x20a0f]:dest_pid[441]> 
<143>1 2022-05-31T05:34:56.169494+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25331"] << get_subtn_adest_details
<143>1 2022-05-31T05:34:56.169496+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25332"] MDS:DB: sub_adest_details: <rem_node[0x20a0f]:dest_pid[441]>
<143>1 2022-05-31T05:34:56.169498+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25333"] << mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.169499+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25334"] << mds_mcm_svc_up
...

<143>1 2022-05-31T05:34:56.175867+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25497"] >> mds_mcm_svc_up
<143>1 2022-05-31T05:34:56.175869+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25498"] MCM:API: LOCAL SVC INFO  : svc_id = AVND(13) | PWE id = 1 | VDEST id = 65535 |
<143>1 2022-05-31T05:34:56.17587+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25499"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2022-05-31T05:34:56.175872+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25500"] >> mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.175874+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25501"] << mds_svc_tbl_query
<143>1 2022-05-31T05:34:56.175875+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25502"] >> mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.175877+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25503"] << mds_subtn_tbl_get_details
<143>1 2022-05-31T05:34:56.175879+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25504"] >> mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.175881+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25505"] << mds_mcm_validate_scope
<143>1 2022-05-31T05:34:56.175882+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25506"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.175885+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25507"] MDS:DB: Subscription Result not present
<143>1 2022-05-31T05:34:56.175887+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25508"] << mds_get_subtn_res_tbl_by_adest
<143>1 2022-05-31T05:34:56.175888+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25509"] >> mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.17589+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25510"] << mds_subtn_res_tbl_get
<143>1 2022-05-31T05:34:56.175891+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25511"] >> mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.175893+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25512"] MDS:DB: adest_details: <rem_node[0x2040f]:dest_pid[441]> 
<143>1 2022-05-31T05:34:56.175895+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25513"] << get_subtn_adest_details
<143>1 2022-05-31T05:34:56.175897+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25514"] MDS:DB: sub_adest_details: <rem_node[0x2040f]:dest_pid[441]>
<143>1 2022-05-31T05:34:56.175898+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25515"] << mds_subtn_res_tbl_add
<143>1 2022-05-31T05:34:56.1759+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25516"] << mds_mcm_svc_up
  • SC-1 went down and SC-4 was notified as new active:
2022-05-31T05:34:56.214424+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25653"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_DOWN for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x2010f]:dest_pid[466]>, rem_svc_pvt_ver=7
...
2022-05-31T05:34:56.21448+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25662"] MCM:API: svc_up : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_NEW_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x2040f]:dest_pid[441]>, rem_svc_pvt_ver=7
  • SC-4 went down and SC-10 was notified as new active:
2022-05-31T05:34:56.214606+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25731"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_DOWN for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x2040f]:dest_pid[441]>, rem_svc_pvt_ver=7
...
2022-05-31T05:34:56.214626+02:00 SC-3 osafamfnd 454 mds.log [meta sequenceId="25740"] MCM:API: svc_up : svc_id = AVND(13) on DEST id = 65535 got NCSMDS_NEW_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = <rem_node[0x20a0f]:dest_pid[441]>, rem_svc_pvt_ver=7
  • SC-3 rebooted because it detected two active amfd:
2022-05-23 14:41:16.878 SC-3 osafamfnd[454]: Rebooting OpenSAF NodeId = 2030f EE Name = , Reason: AVD already up, OwnNodeId = 2030f, SupervisionTime = 60
2022-05-23 14:41:16.890 SC-3 opensaf_reboot: Rebooting local node; timeout=60

Related

Wiki: ChangeLog-5.22.11

Discussion

  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-06-10

    In normal case, the up flag will be reset by the mds thread whenever the amfnd receives a NEW_ACTIVE event:

        case NCSMDS_NEW_ACTIVE:
          if (evt_info->i_svc_id == NCSMDS_SVC_ID_AVD) {
            LOG_NO("AVD NEW_ACTIVE, adest:%" PRIu64, evt_info->i_dest);
    
            // sometimes NEW_ACTIVE director is received before
            // DOWN is received for the old director ..
            if (m_AVND_CB_IS_AVD_UP(cb)) {
              m_AVND_CB_AVD_UP_RESET(cb);
            }
    

    That flag is also set by the amfnd main thread when amfnd process the AVND_EVT_MDS_AVD_UP event. The up flag isn't protected so it is not thread-safe.

    This was what happen when the issue occurred:

    Amfnd main thread Amfnd mds thread Up flag
    Receive NEW_ACTIVE event True
    Reset up flag, and send AVND_EVT_MDS_AVD_UP event False
    Receive NEW_ACTIVE event False
    Reset up flag, and send AVND_EVT_MDS_AVD_UP event False
    Receive AVND_EVT_MDS_AVD_UP False
    Process AVND_EVT_MDS_AVD_UP True
    Receive AVND_EVT_MDS_AVD_UP True
    Process AVND_EVT_MDS_AVD_UP True

    And this is in normal case:

    Amfnd main thread Amfnd mds thread Up flag
    Receive NEW_ACTIVE event True
    Reset up flag, and send AVND_EVT_MDS_AVD_UP event False
    Receive AVND_EVT_MDS_AVD_UP False
    Process AVND_EVT_MDS_AVD_UP True
    Receive NEW_ACTIVE event True
    Reset up flag, and send AVND_EVT_MDS_AVD_UP event False
    Receive AVND_EVT_MDS_AVD_UP False
    Process AVND_EVT_MDS_AVD_UP True
     
  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-06-14
    • status: accepted --> review
     
  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-07-21
    • status: review --> accepted
     
  • Mohan  Kanakam

    Mohan Kanakam - 2022-07-22

    Hi Hieu,
    Just to reproduce this ticket, can you please let me know how are you splitting the network into separate partitions and then how are you able to merge the networks. I assume, you are doing it on virtual machines?
    Thanks

     
  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-07-25

    Hi Mohan Kanakam,

    I'm using "lxc" to run the cluster and tool "iptables" to split/merge network. I reproduce this ticket as follow:

    • Add 5 seconds delay before promoting active by changing the fmd configuration(/etc/opensaf/fmd.conf):
    export FMS_PROMOTE_ACTIVE_TIMER=500             # 5 second
    
    • Cluster have 5 SCs:
    SC-1 ACT : SC-2 QSC : SC-3 QSC : SC-4 QSC : SC-5 STB
       ^          ^            ^         ^         ^
       |          |            |         |         |
       ---------------------------------------------
    

    (ACT: active, STB: standby, QSC: quiesced)

    • Isolate SC-2 and SC-3. We will have two more active SCs because the isolated SC will be promoted to active soon:
    SC-1 ACT : SC-2 ACT : SC-3 ACT : SC-4 QSC : SC-5 STB
       ^                                ^          ^
       |                                |          |
       ---------------------------------------------
    
    • Unblock connection from SC-4 to SC-2 then unblock connection from SC-4 to SC-3. SC-4 won't receive any up events as a consequence of ticket 3281
    SC-1 ACT : SC-2 ACT : SC-3 ACT : SC-4 QSC : SC-5 STB
      ^           ^          ^          ^ ^ ^      ^
      |           |          |          | | |      |
      |           |          ------------ | |      |
      |           ------------------------- |      |
       ---------------------------------------------
    
    • Stop SC-1, SC-2 and SC-3 sequentially. Now SC-4 receives a NEW_ACTIVE event for SC-2 then receives another NEW_ACTIVE event for SC-3. Note: SC-5 won't become active until the promote timer expired.

    Another important condition is two NEW_ACTIVE events must be processed in nearly same time. However amfnd processes a NEW_ACTIVE event very fast. It's hard to reproduce it, therefore I changed the source code to slow down amfnd. The detail of it is in attached file.

    Best regards,
    Hieu

     

    Last edit: Hieu Hong Hoang 2022-07-26
  • Mohan  Kanakam

    Mohan Kanakam - 2022-07-27

    Hi Hieu,
    Thanks for the steps to reproduce issue.

     
  • Mohan  Kanakam

    Mohan Kanakam - 2022-07-29

    Hi Hieu,
    I tested with 2 SCs[SC-1, SC-2] and 2PLs. When I created 2 partitions [SC-1 Active] and [SC-2(Act) and 2 payloads]. When I merged the partitions, then both SC-1 and SC-2 rebooted because of split-brain and then both payloads rebooted because there is no controller.
    1. Is it an expected bahaviour ? Or
    2. Only one SC(either SC-1 or SC-2) should reboot and the payloads should be running with that SC.
    Thanks

     
    • Hieu Hong Hoang

      Hieu Hong Hoang - 2022-08-01

      Hi Mohan,

      1. Yes, it is an expected behavior. The active SCs rebooted because the active SC saw other active SC in a cluster. PLs rebooted because it lost connection with both active SC and standby SC.
        To enhance, OpenSAF introduced the SC absence feature documented in "src/imm/README.SC_ABSENCE". If the SC absence feature is enabled, the PL will not reboot immediately. After SCs rebooted and one of them became active,
        all PLs which is not in the same partition with the new active SC will be rebooted.
        For example:
        -- We split cluster into 2 partitions [SC-1, PL-1] [SC-2, PL-2, PL-3].
        -- After merging the partitions, SC-1 and SC-2 reboot.
        -- SC-1 becomes an active SC and SC-2 becomes a standby SC.
        -- SC-1 requires PL-2 and PL-3 to reboot. The PL-1 will survive after merging network.

      2. We are not support this, but it is a promising approach. To implement it, we need a strategy to choose a SC among the active SCs, reassign sus in PLs, synchronize IMM data in PLs etc... The survived SC must verify information of all unknown PLs and resolve conflicts between PLs. However, I think it's easier if we restart PLs.

      Best regards,
      Hieu

       
      • Mohan  Kanakam

        Mohan Kanakam - 2022-08-01

        Hi Hieu,
        Awesome, thanks for your response, appreciate it.
        I run the following test case with your patch of #3317. We had enabled SC_ABSENCE feature in both the partition i.e. at SC-1 and SC-2's immd.conf.
        1. [SC-1, PL-3] in one cluster with SC-1 Active.
        2. [SC-2, PL-4] in another cluster with SC-2 Active.
        3. Then I merged both the partitions. So, SC-1 and SC-2 detect each other and report split brain and both SC-1 and SC-2 reboots. PL-3 and PL-4 keeps running because of SC_ABSENCE is enabled.
        4. SC-1 comes back as Active and both the payloads PL-3 and PL-4 detects SC-1 and syncs up with SC-1 and no payload reboots. Later on SC-2 comes back as Standby.
        Here, as per you, payload PL-4, which was running in another partition, should have rebooted, but it didn't reboot. Can you please suggest if we are doing something wrong?
        Thanks
        -Mohan

         
        • Hieu Hong Hoang

          Hieu Hong Hoang - 2022-08-02

          Hi Mohan,

          Please check the variable "IMMSV_COORD_SELECT_NODE" in the file "immd.conf". It should be enabled to check the partition of a node. That code is in the file "src/imm/immd/immd_evt.c":

          static bool is_on_same_partition_with_coord(
              IMMD_CB *cb,
              const IMMD_IMMND_INFO_NODE *node_info) {
              assert(cb->immnd_coord && "No coordinator existing");
              // Same partition with the current IMMND coord
              if ((cb->coord_select_node == false) ||
                  (cb->ex_immd_node_id == node_info->ex_immd_node_id))
                  return true;
          
              LOG_WA("Node %x ex-IMMD=%x != current IMMND coord %x ex-IMMD=%x",
                     node_info->immnd_key, node_info->ex_immd_node_id,
                     cb->immnd_coord, cb->ex_immd_node_id);
          
              return false;
          }
          

          Best regards,
          Hieu

           
  • Paul

    Paul - 2022-07-29

    Hello Hieu,
    Do you have any documentation on split brain and the expected results with some sample use cases? Can I find it in PR documents?
    If you have any document, please share or if you can please write it a small 1 page or 2 pages document and share it with me. I will help you in testing the upcoming patches.
    Thanks
    Paul

     
    • Hieu Hong Hoang

      Hieu Hong Hoang - 2022-08-02

      Hi Paul,

      OpenSAF have several ways to handle the split-brain. They were mentioned in the following documents:

      • sc absence: src/imm/README.SC_ABSENCE
      • remote fencing: docs/OpenSAF_Overview_PR.odt 3.7.5
      • split-brain prevention: docs/OpenSAF_Overview_PR.odt 3.7.6
      • split-brain recovery: docs/OpenSAF_Overview_PR.odt 3.7.7

      However, there's no user case in those documents. Because there are a lot of user cases related to those features, please let me know if you are interesting to any of them.

      Best regards,
      Hieu

       
  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-08-15

    commit 629f41983434332c732bca7f11362e5f5942a96e (HEAD -> develop, origin/develop, ticket-3317)
    Author: hieu.h.hoang hieu.h.hoang@dektech.com.au
    Date: Tue Jun 14 08:51:33 2022 +0700

    amf: Update handling mds event in amfnd [#3317]
    
    In amfnd, there is a flag represent for the amfd service state (up/down).
    That flag was set by amfnd main thread and amfnd mds thread. If two amfd
    NEW_ACTIVE events come at almost the same time, the up flag value will be
    inccorect due to the setting conflict between two threads. Solution is to
    set that flag in main thread only and check the amfd NO_ACTIVE event.
    
     
  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-08-15
    • status: accepted --> fixed
     

Log in to post a comment.