Menu

#2996 rded: allow SC promotion even if consensus service is unavailable

5.19.03
fixed
Gary Lee
None
enhancement
rde
d
major
False
2019-01-23
2019-01-10
Gary Lee
No

If split brain prevention is enabled, rded does not allow promotion of a node to Active unless a lock can be obtained. For some users, it may be preferrable to allow this to occur even if the consensus service is not available, so the consensus service does not become a single point of failure.

If a peer SC can be seen by a SC, and that SC has the lower node ID, we can optionally allow node promotion to occur during SC election, if the consensus service is not available.

During normal cluster operation, if the consensus service becomes unavailable, but a peer SC can still be seen, then the active SC may remain active.

This optional feature must not be used together with the roaming SC feature.

A parameter will be added to fmd.conf (default value is 0):

export FMS_RELAXED_NODE_PROMOTION=1

Related

Tickets: #2995
Tickets: #2997
Wiki: ChangeLog-5.19.03
Wiki: NEWS-5.19.03

Discussion

  • Gary Lee

    Gary Lee - 2019-01-21
    • status: accepted --> review
     
  • Gary Lee

    Gary Lee - 2019-01-23

    commit 5ebaa1d4ef5afd86f19adba447460e2b77b8fa9b
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Mon Jan 21 13:45:08 2019 +1100

    rded: add relaxed node promotion feature [#2996]
    
    Allow promotion of node to active at cluster startup, even if the
    consensus service is unavailable, if the peer SC can be seen.
    
    During normal cluster operation, if the consensus service becomes
    unavailable but the peer SC can still be seen, allow the existing
    active SC to remain active.
    
    A new NCSMDS_SVC_ID_RDE_DISCOVERY service ID is exported by rded.
    This is installed as soon as rded is started, unlike
    NCSMDS_SVC_ID_RDE which is only installed when it becomes
    a candidate for election.
    

    commit 8a175064ec967a6a64c1deeef8e94a2c31216069
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Mon Jan 21 12:02:55 2019 +1100

    amfd: allow node to remain active is peer SC can be seen [#2996]
    
    If relaxed node promotion is enabled, allow a SC to remain
    active if the peer SC can be seen, even if access to the
    consensus service is lost.
    

    commit dc5abfa9800310be79dedbac254036daae8826bf
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Mon Jan 21 11:55:02 2019 +1100

    osaf: add support for FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
    FMS_RELAXED_NODE_PROMOTION [#2996]
    
    Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow
    active SC to be preferred during a network split. The default
    behavior is to prefer the larger partition to maintain
    existing behaviour.
    
    Add configuration support for FMS_RELAXED_NODE_PROMOTION.
    

    commit b116c69d09377897fd6dd223552bdf75683a3da5
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Mon Jan 21 11:49:17 2019 +1100

    fmd: add configuration parameters [#2996]
    
    Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
    FMS_RELAXED_NODE_PROMOTION.
    

    commit 1ccaecc7c0308e460f76a7dd94c59f4734ed64fd
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Mon Jan 21 11:39:24 2019 +1100

    osaf: update etcd3 to poll instead of watch [#2996]
    
    The 'watch' command does not return if the etcd server goes down.
    We need to poll the etcd server to properly check we still have
    connectivity to the etcd server.
    
     
  • Gary Lee

    Gary Lee - 2019-01-23
    • status: review --> fixed
     

Log in to post a comment.

MongoDB Logo MongoDB