OpenSAF / Tickets / #3317 amfnd: two NEW_ACTIVE amfd in split-brain scenario

In normal case, the up flag will be reset by the mds thread whenever the amfnd receives a NEW_ACTIVE event:

    case NCSMDS_NEW_ACTIVE:
      if (evt_info->i_svc_id == NCSMDS_SVC_ID_AVD) {
        LOG_NO("AVD NEW_ACTIVE, adest:%" PRIu64, evt_info->i_dest);

        // sometimes NEW_ACTIVE director is received before
        // DOWN is received for the old director ..
        if (m_AVND_CB_IS_AVD_UP(cb)) {
          m_AVND_CB_AVD_UP_RESET(cb);
        }

That flag is also set by the amfnd main thread when amfnd process the AVND_EVT_MDS_AVD_UP event. The up flag isn't protected so it is not thread-safe.

This was what happen when the issue occurred:

Amfnd main thread	Amfnd mds thread	Up flag
	Receive NEW_ACTIVE event	True
	Reset up flag, and send AVND_EVT_MDS_AVD_UP event	False
	Receive NEW_ACTIVE event	False
	Reset up flag, and send AVND_EVT_MDS_AVD_UP event	False
Receive AVND_EVT_MDS_AVD_UP		False
Process AVND_EVT_MDS_AVD_UP		True
Receive AVND_EVT_MDS_AVD_UP		True
Process AVND_EVT_MDS_AVD_UP		True

And this is in normal case:

Amfnd main thread	Amfnd mds thread	Up flag
	Receive NEW_ACTIVE event	True
	Reset up flag, and send AVND_EVT_MDS_AVD_UP event	False
Receive AVND_EVT_MDS_AVD_UP		False
Process AVND_EVT_MDS_AVD_UP		True
	Receive NEW_ACTIVE event	True
	Reset up flag, and send AVND_EVT_MDS_AVD_UP event	False
Receive AVND_EVT_MDS_AVD_UP		False
Process AVND_EVT_MDS_AVD_UP		True

Hieu Hong Hoang - 2022-06-14

status: accepted --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hieu Hong Hoang - 2022-07-21

status: review --> accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mohan Kanakam - 2022-07-22

Hi Hieu,
Just to reproduce this ticket, can you please let me know how are you splitting the network into separate partitions and then how are you able to merge the networks. I assume, you are doing it on virtual machines?
Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hieu Hong Hoang - 2022-07-25

Hi Mohan Kanakam,

I'm using "lxc" to run the cluster and tool "iptables" to split/merge network. I reproduce this ticket as follow:

Add 5 seconds delay before promoting active by changing the fmd configuration(/etc/opensaf/fmd.conf):

export FMS_PROMOTE_ACTIVE_TIMER=500 # 5 second

Cluster have 5 SCs:

SC-1 ACT : SC-2 QSC : SC-3 QSC : SC-4 QSC : SC-5 STB ^ ^ ^ ^ ^ | | | | | ---------------------------------------------

(ACT: active, STB: standby, QSC: quiesced)

Isolate SC-2 and SC-3. We will have two more active SCs because the isolated SC will be promoted to active soon:

SC-1 ACT : SC-2 ACT : SC-3 ACT : SC-4 QSC : SC-5 STB ^ ^ ^ | | | ---------------------------------------------

Unblock connection from SC-4 to SC-2 then unblock connection from SC-4 to SC-3. SC-4 won't receive any up events as a consequence of ticket 3281

SC-1 ACT : SC-2 ACT : SC-3 ACT : SC-4 QSC : SC-5 STB ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | ------------ | | | | ------------------------- | | ---------------------------------------------

Stop SC-1, SC-2 and SC-3 sequentially. Now SC-4 receives a NEW_ACTIVE event for SC-2 then receives another NEW_ACTIVE event for SC-3. Note: SC-5 won't become active until the promote timer expired.

Another important condition is two NEW_ACTIVE events must be processed in nearly same time. However amfnd processes a NEW_ACTIVE event very fast. It's hard to reproduce it, therefore I changed the source code to slow down amfnd. The detail of it is in attached file.

Best regards,
Hieu

Last edit: Hieu Hong Hoang 2022-07-26

amfnd_delay_processing.diff
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mohan Kanakam - 2022-07-27

Hi Hieu,
Thanks for the steps to reproduce issue.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mohan Kanakam - 2022-07-29

Hi Hieu,
I tested with 2 SCs[SC-1, SC-2] and 2PLs. When I created 2 partitions [SC-1 Active] and [SC-2(Act) and 2 payloads]. When I merged the partitions, then both SC-1 and SC-2 rebooted because of split-brain and then both payloads rebooted because there is no controller.
1. Is it an expected bahaviour ? Or
2. Only one SC(either SC-1 or SC-2) should reboot and the payloads should be running with that SC.
Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hieu Hong Hoang - 2022-08-01
  
  Hi Mohan,
  
  Yes, it is an expected behavior. The active SCs rebooted because the active SC saw other active SC in a cluster. PLs rebooted because it lost connection with both active SC and standby SC.
  To enhance, OpenSAF introduced the SC absence feature documented in "src/imm/README.SC_ABSENCE". If the SC absence feature is enabled, the PL will not reboot immediately. After SCs rebooted and one of them became active,
  all PLs which is not in the same partition with the new active SC will be rebooted.
  For example:
  -- We split cluster into 2 partitions [SC-1, PL-1] [SC-2, PL-2, PL-3].
  -- After merging the partitions, SC-1 and SC-2 reboot.
  -- SC-1 becomes an active SC and SC-2 becomes a standby SC.
  -- SC-1 requires PL-2 and PL-3 to reboot. The PL-1 will survive after merging network.
  
  We are not support this, but it is a promising approach. To implement it, we need a strategy to choose a SC among the active SCs, reassign sus in PLs, synchronize IMM data in PLs etc... The survived SC must verify information of all unknown PLs and resolve conflicts between PLs. However, I think it's easier if we restart PLs.
  
  Best regards,
  Hieu
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mohan Kanakam - 2022-08-01
    
    Hi Hieu,
    Awesome, thanks for your response, appreciate it.
    I run the following test case with your patch of #3317. We had enabled SC_ABSENCE feature in both the partition i.e. at SC-1 and SC-2's immd.conf.
    1. [SC-1, PL-3] in one cluster with SC-1 Active.
    2. [SC-2, PL-4] in another cluster with SC-2 Active.
    3. Then I merged both the partitions. So, SC-1 and SC-2 detect each other and report split brain and both SC-1 and SC-2 reboots. PL-3 and PL-4 keeps running because of SC_ABSENCE is enabled.
    4. SC-1 comes back as Active and both the payloads PL-3 and PL-4 detects SC-1 and syncs up with SC-1 and no payload reboots. Later on SC-2 comes back as Standby.
    Here, as per you, payload PL-4, which was running in another partition, should have rebooted, but it didn't reboot. Can you please suggest if we are doing something wrong?
    Thanks
    -Mohan
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Hieu Hong Hoang - 2022-08-02
      
      Hi Mohan,
      
      Please check the variable "IMMSV_COORD_SELECT_NODE" in the file "immd.conf". It should be enabled to check the partition of a node. That code is in the file "src/imm/immd/immd_evt.c":
      
      static bool is_on_same_partition_with_coord( IMMD_CB *cb, const IMMD_IMMND_INFO_NODE *node_info) { assert(cb->immnd_coord && "No coordinator existing"); // Same partition with the current IMMND coord if ((cb->coord_select_node == false) || (cb->ex_immd_node_id == node_info->ex_immd_node_id)) return true; LOG_WA("Node %x ex-IMMD=%x != current IMMND coord %x ex-IMMD=%x", node_info->immnd_key, node_info->ex_immd_node_id, cb->immnd_coord, cb->ex_immd_node_id); return false; }
      
      Best regards,
      Hieu
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul - 2022-07-29

Hello Hieu,
Do you have any documentation on split brain and the expected results with some sample use cases? Can I find it in PR documents?
If you have any document, please share or if you can please write it a small 1 page or 2 pages document and share it with me. I will help you in testing the upcoming patches.
Thanks
Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hieu Hong Hoang - 2022-08-02
  
  Hi Paul,
  
  OpenSAF have several ways to handle the split-brain. They were mentioned in the following documents:
  
  sc absence: src/imm/README.SC_ABSENCE
  
  remote fencing: docs/OpenSAF_Overview_PR.odt 3.7.5
  
  split-brain prevention: docs/OpenSAF_Overview_PR.odt 3.7.6
  
  split-brain recovery: docs/OpenSAF_Overview_PR.odt 3.7.7
  
  However, there's no user case in those documents. Because there are a lot of user cases related to those features, please let me know if you are interesting to any of them.
  
  Best regards,
  Hieu
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

commit 629f41983434332c732bca7f11362e5f5942a96e (HEAD -> develop, origin/develop, ticket-3317)
Author: hieu.h.hoang hieu.h.hoang@dektech.com.au
Date: Tue Jun 14 08:51:33 2022 +0700

amf: Update handling mds event in amfnd [#3317]

In amfnd, there is a flag represent for the amfd service state (up/down).
That flag was set by amfnd main thread and amfnd mds thread. If two amfd
NEW_ACTIVE events come at almost the same time, the up flag value will be
inccorect due to the setting conflict between two threads. Solution is to
set that flag in main thread only and check the amfd NO_ACTIVE event.

Hieu Hong Hoang - 2022-08-15

status: accepted --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

amfnd: two NEW_ACTIVE amfd in split-brain scenario

Milestone

Searches

Help

#3317 amfnd: two NEW_ACTIVE amfd in split-brain scenario

Related

Discussion