Menu

#2918 amf: delay node failover of nodes that are separated from the main network partition

5.19.01
fixed
Gary Lee
None
enhancement
amf
-
major
False
2019-01-09
2018-08-24
Gary Lee
No

Tickets [#64] and [#2795] added support to prevent multiple active controllers in a split network scenario. However, nodes residing in the smaller network partitions can remain running. Meanwhile the active SC residing in the largest partition may failover assignments at the unreachable nodes to other reachable nodes, causing conflicts when the partitions are merged.

The original proposal involved two parts, a CLM part and an AMF part. CLM would not announce a node has left the cluster until the fencing of the node has completed successfully. However, some users rely on timely CLM notifications to send out node related events and alarms. Thus the proposal has been changed to be done in AMF only.

AMF should not perform a node failover, until a node has been fenced.

When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed.

If MDS connectivity is re-established while waiting, AMF can wait a few seconds for a node_up (with leds_set == false) message to indicate the node has been already rebooted. Otherwise, AMF can send a message to the node asking it to reboot itself. When AMF sees that the MDS connectivity is lost again, it can consider the fencing to be complete witout the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.

Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability. This option must be configurable via IMM and take effect without a restart. It is up to the user to turn on, if node disturbances are planned or expected in the environment due to poor quality links between the nodes.

Additionally, we should allow the user to set this 'node failover' timer to a smaller value than IMMSV_SC_ABSENCE_ALLOWED, with the understanding that this introduces the risk of duplicate assignments.

1 Attachments

Related

Tickets: #2795
Tickets: #2920
Tickets: #2952
Tickets: #2957
Tickets: #64
Wiki: ChangeLog-5.19.01
Wiki: NEWS-5.19.01

Discussion

  • Anders Widell

    Anders Widell - 2018-08-24
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,5 +1,7 @@
     Tickets [#64] and [#2795] added support to prevent multiple active controllers in a split network scenario. However, nodes residing in the smaller network partitions can remain running.  Meanwhile the active SC residing in the largest partition may failover assignments at the unreachable nodes to other reachable nodes,  causing conflicts when the partitions are merged.
    
    -We also need to consider when remote fencing is not available.
    +There are two parts needed for this; a CLM part and an AMF part:
    
    +* CLM should not announce that a node has left the cluster until the fencing of the node has completed successfully. When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed.
    
    +* AMF should use CLM (only) as source of information regarding which nodes are up or down. AMF should not use MDS link notifications or MDS service notifications for this purpose.
    
     

    Related

    Tickets: #2795
    Tickets: #64

  • Anders Widell

    Anders Widell - 2018-08-24
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -2,6 +2,6 @@
    
     There are two parts needed for this; a CLM part and an AMF part:
    
    -* CLM should not announce that a node has left the cluster until the fencing of the node has completed successfully. When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed.
    +* CLM should not announce that a node has left the cluster until the fencing of the node has completed successfully. When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed. If MDS connectivity is re-established while waiting, CLM can send an MDS message to the node asking it to reboot itself. When CLM has received a reply to the reboot request (over MDS) and then later sees that the MDS connectivity is lost again, it can consider the fencing to be complete witout the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
    
    
     * AMF should use CLM (only) as source of information regarding which nodes are up or down. AMF should not use MDS link notifications or MDS service notifications for this purpose.
    
     
  • Gary Lee

    Gary Lee - 2018-08-31
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -5,3 +5,5 @@
    
     * CLM should not announce that a node has left the cluster until the fencing of the node has completed successfully. When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed. If MDS connectivity is re-established while waiting, CLM can send an MDS message to the node asking it to reboot itself. When CLM has received a reply to the reboot request (over MDS) and then later sees that the MDS connectivity is lost again, it can consider the fencing to be complete witout the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
    
    
     * AMF should use CLM (only) as source of information regarding which nodes are up or down. AMF should not use MDS link notifications or MDS service notifications for this purpose.
    +
    +Waiting for IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability. Perhaps this should be configurable and only take effect if node disturbances are expected.
    
     
  • Gary Lee

    Gary Lee - 2018-08-31
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -6,4 +6,4 @@
    
    
     * AMF should use CLM (only) as source of information regarding which nodes are up or down. AMF should not use MDS link notifications or MDS service notifications for this purpose.
    
    -Waiting for IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability. Perhaps this should be configurable and only take effect if node disturbances are expected.
    +Waiting for IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability. Perhaps this should be configurable and only take effect if node disturbances are planned / expected.
    
     
  • Gary Lee

    Gary Lee - 2018-09-07
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -6,4 +6,6 @@
    
    
     * AMF should use CLM (only) as source of information regarding which nodes are up or down. AMF should not use MDS link notifications or MDS service notifications for this purpose.
    
    -Waiting for IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability. Perhaps this should be configurable and only take effect if node disturbances are planned / expected.
    +Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability.
    +This option must be configurable via IMM and take effect without a restart.
    +It is up to the user to turn on, if node disturbances are planned or expected in the environment due to poor quality links between the nodes.
    
     
  • Gary Lee

    Gary Lee - 2018-09-19
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -9,3 +9,5 @@
     Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability.
     This option must be configurable via IMM and take effect without a restart.
     It is up to the user to turn on, if node disturbances are planned or expected in the environment due to poor quality links between the nodes.
    +
    +Additionally, we should allow the user to set this 'node failover' timer to a smaller value than IMMSV_SC_ABSENCE_ALLOWED, with the understanding that this introduces the risk of duplicate assignments.
    
     
  • Gary Lee

    Gary Lee - 2018-09-20

    Other applications are probably also using CLM notifications and rely on timely notifications. Perhaps we should do this in AMF only.

     

    Last edit: Gary Lee 2018-09-20
  • Gary Lee

    Gary Lee - 2018-09-25
    • status: unassigned --> accepted
    • assigned_to: Gary Lee
     
  • Gary Lee

    Gary Lee - 2018-09-29
    • Milestone: 5.18.09 --> 5.18.12
     
  • Gary Lee

    Gary Lee - 2018-10-04
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,13 +1,13 @@
     Tickets [#64] and [#2795] added support to prevent multiple active controllers in a split network scenario. However, nodes residing in the smaller network partitions can remain running.  Meanwhile the active SC residing in the largest partition may failover assignments at the unreachable nodes to other reachable nodes,  causing conflicts when the partitions are merged.
    
    -There are two parts needed for this; a CLM part and an AMF part:
    +The original proposal involved two parts, a CLM part and an AMF part. CLM would not announce a node has left the cluster  until the fencing of the node has completed successfully. However, some users rely on timely CLM notifications to send out node related events and alarms. Thus the proposal has been changed to be done in AMF only.
    
    -* CLM should not announce that a node has left the cluster until the fencing of the node has completed successfully. When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed. If MDS connectivity is re-established while waiting, CLM can send an MDS message to the node asking it to reboot itself. When CLM has received a reply to the reboot request (over MDS) and then later sees that the MDS connectivity is lost again, it can consider the fencing to be complete witout the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
    +AMF should not perform a node failover, until a node has been fenced.
    
    -* AMF should use CLM (only) as source of information regarding which nodes are up or down. AMF should not use MDS link notifications or MDS service notifications for this purpose.
    +When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed.
    
    -Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability.
    -This option must be configurable via IMM and take effect without a restart.
    -It is up to the user to turn on, if node disturbances are planned or expected in the environment due to poor quality links between the nodes.
    +If MDS connectivity is re-established while waiting, AMF can wait a few seconds for a node_up (with leds_set == false) message to indicate the node has been rebooted. Otherwise, AMF can send a message to the node asking it to reboot itself. When AMF sees that the MDS connectivity is lost again, it can consider the fencing to be complete witout the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
    +
    +Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability. This option must be configurable via IMM and take effect without a restart. It is up to the user to turn on, if node disturbances are planned or expected in the environment due to poor quality links between the nodes.
    
     Additionally, we should allow the user to set this 'node failover' timer to a smaller value than IMMSV_SC_ABSENCE_ALLOWED, with the understanding that this introduces the risk of duplicate assignments.
    
     

    Related

    Tickets: #2795
    Tickets: #64

  • Gary Lee

    Gary Lee - 2018-10-04
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -6,7 +6,7 @@
    
     When using remote fencing, this means that the fencing API has reported that the fencing was completed. When remote fencing is disabled, we need to wait for at least IMMSV_SC_ABSENCE_ALLOWED seconds (the configuration in immd.conf) before considering the fencing to be completed.
    
    -If MDS connectivity is re-established while waiting, AMF can wait a few seconds for a node_up (with leds_set == false) message to indicate the node has been rebooted. Otherwise, AMF can send a message to the node asking it to reboot itself. When AMF sees that the MDS connectivity is lost again, it can consider the fencing to be complete witout the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
    +If MDS connectivity is re-established while waiting, AMF can wait a few seconds for a node_up (with leds_set == false) message to indicate the node has been already rebooted. Otherwise, AMF can send a message to the node asking it to reboot itself. When AMF sees that the MDS connectivity is lost again, it can consider the fencing to be complete witout the need to wait the full IMMSV_SC_ABSENCE_ALLOWED time.
    
     Potentially waiting up to IMMSV_SC_ABSENCE_ALLOWED seconds will affect availability. This option must be configurable via IMM and take effect without a restart. It is up to the user to turn on, if node disturbances are planned or expected in the environment due to poor quality links between the nodes.
    
     
  • Gary Lee

    Gary Lee - 2018-10-04
    • summary: osaf: fence nodes that are separated from the main network partition --> amf: delay node failover of nodes that are separated from the main network partition
     
  • Gary Lee

    Gary Lee - 2018-10-17

    Initial version

     

    Last edit: Gary Lee 2018-10-17
  • Gary Lee

    Gary Lee - 2018-10-18

    Revision 2

     

    Last edit: Gary Lee 2018-10-19
  • Gary Lee

    Gary Lee - 2018-10-22

    Rev3

     

    Last edit: Gary Lee 2018-10-23
  • Gary Lee

    Gary Lee - 2018-10-24
    • status: accepted --> review
     
  • Gary Lee

    Gary Lee - 2018-10-25
    • Attachments has changed:

    Diff:

    --- old
    +++ new
    @@ -0,0 +1 @@
    +Delay Node Failover.png (53.0 kB; image/png)
    
     
  • Gary Lee

    Gary Lee - 2018-11-04
    • status: review --> fixed
     
  • Gary Lee

    Gary Lee - 2018-11-14

    Proposed changes to AMF doc. Renumbered 2.2.18 Excessive assignments to 2.2.19. Added 2.2.18 Network paritioning. Added timers to Section 3.3.

     
  • Gary Lee

    Gary Lee - 2019-01-09
    • Component: unknown --> amf
     

Log in to post a comment.

MongoDB Logo MongoDB