OpenSAF / Tickets / #3263 rde: Cluster is unrecoverable after all nodes split-brain in roaming SC

Minh Hon Chau - 2021-05-14

status: unassigned --> accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

summary: rde: Cluster is unrecoverable after all node split-brain in roaming SC --> rde: Cluster is unrecoverable after all nodes split-brain in roaming SC
Description has changed:

Diff:

--- old
+++ new
@@ -1,2 +1,2 @@
-In Roaming SC deployment, if split-brain occurs that separate all node apart, in which each partition has one SC, we have all SC becoming active. At rejoin, all SC detects themself as duplicated active to one of other SC, they should all reboot, ideally.
+In Roaming SC deployment, if split-brain occurs that separates all nodes apart, in which each partition has one SC, we have all SCs becoming active. At rejoin, all SCs detect themself as duplicated active to one of other SCs, they should all reboot, ideally.
 However, sometimes the last active SC is not detected as duplicated because all the other SCs already reboot. The last SC does not find any others as active duplicated to itself. As of this result, since the last SC is not healthy throughout the split time, it&#39;s causing many errors for other nodes to rejoin again after reboot.

Minh Hon Chau - 2021-05-26

status: accepted --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Minh Hon Chau - 2021-05-26

commit 68fde36133a5fd47b667c6971c967a7cf8629b03
Author: Minh Chau minh.chau@dektech.com.au
Date: Wed May 26 21:05:12 2021 +1000

rde: Use broadcast for peer info message [#3263]

commit ca0cb78a03a2eb3cfa3519b4c5d9af0905f325a5
Author: Minh Chau minh.chau@dektech.com.au
Date: Wed May 26 21:05:12 2021 +1000

rde: Add timeout waiting for peer info [#3263]
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary Lee - 2021-09-14

commit bbe47278c2499bc738bf0c2dc8cc4ebbbb9a026d
Author: Minh Chau minh.chau@dektech.com.au
Date: Tue Jul 13 18:00:41 2021 +1000

rde: Add timeout of waiting for peer info [#3263] This ticket revisit the waiting for peer info and fix the problem of disordered peer_up and peer info in the commit d1593b03b3c9bec292b14dde65264c261760bf46
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary Lee - 2021-09-14

status: fixed --> assigned
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary Lee - 2021-09-14

status: assigned --> fixed

Milestone: 5.21.06 --> 5.21.09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rde: Cluster is unrecoverable after all nodes split-brain in roaming SC

Milestone

Searches

Help

#3263 rde: Cluster is unrecoverable after all nodes split-brain in roaming SC

Related

Discussion