Menu

#2795 rde: Select active SC in largest network partition during split-brain

5.18.04
fixed
Gary Lee
None
enhancement
rde
-
major
False
2018-04-17
2018-03-06
No

In ticket [#64], we introduced a simple split-brain prevention mechanism that will guarantee that there is at most one active system controller in the cluster. However, it does not guarantee that the active system controller will reside in the largest network partition. The attached document describes a way to select the largest partition using what we call passive locks.

1 Attachments

Related

Tickets: #2918
Tickets: #2989
Tickets: #2995
Tickets: #64
Wiki: ChangeLog-5.18.04
Wiki: NEWS-5.18.04

Discussion

  • Gary Lee

    Gary Lee - 2018-03-28
    • status: unassigned --> accepted
    • assigned_to: Gary Lee
     
  • Gary Lee

    Gary Lee - 2018-04-09

    A step 0 needs to be added to 'Passive Lock Implementation', before trying to obtain a lock.

    0) Check for an existing takeover request. If exists, wait until it expires (in a reasonable time).

    This is needed in case the old active controller that is being taken over, reboots so quickly (after it self-fences, if remote fencing is not available) that it is able to retain/obtain the lock before the new controller is able to lock it itself.

     

    Last edit: Gary Lee 2018-04-09
  • Gary Lee

    Gary Lee - 2018-04-17

    Develop:

    commit d242261b4da485768d11e7a558a9ce3b69097a11
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Tue Apr 17 19:58:23 2018 +1000

    osaf: remove timestamp from takeover request [#2795]
    
    * update create() in the plugins to include a timeout parameter
    * remove timestamp from the takeover request and utilise the
      built-in timeout functionality in the KV store
    

    commit 6ba118a9832f0c6667b52aeda979632d435b542c
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Tue Apr 17 19:58:23 2018 +1000

    rded: adapt to new Consensus API [#2795]
    
    - add 3 new internal message:
    
    RDE_MSG_NODE_UP
    RDE_MSG_NODE_DOWN
    RDE_MSG_TAKEOVER_REQUEST_CALLBACK
    
    - subscribe to AMFND service up events to keep track of the number
      of cluster members
    
    - listen for takeover requests in KV store
    

    commit 713f5acff36ee10883dacb9222b9f00c3dee8d10
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Tue Apr 17 19:58:23 2018 +1000

    fmd: adapt to new Consensus API [#2795]
    

    commit 3b8bb17ccc996d77c1dee61b8295e1454090437f
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Tue Apr 17 19:58:23 2018 +1000

    amfd: adapt to new Consensus API [#2795]
    

    commit 1efef47b243a6b6863d40e7d5ac98dd3aea4e996
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Tue Apr 17 19:58:23 2018 +1000

    osaf: add lock takeover request fuction [#2795]
    
    - add create and set (if previous value matches) functions to KeyValue class
    - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests
    - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will
      create a takeover request in the KV store. An existing SC can reject the lock takeover
    

    commit 867955ca099e9fc7e226adb7b9f465eb9fa680a4
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Tue Apr 17 19:58:23 2018 +1000

    osaf: extend API to include a create key and an enhanced set key function [#2795]
    
    - add create_key function (fails if key already exists)
    - add setkey_match_prev function (set value if previous value matches)
    - add missing quotes
    - add etcd3.plugin
    
     
  • Gary Lee

    Gary Lee - 2018-04-17
    • status: accepted --> fixed
     

Log in to post a comment.