From: A.R K. <a.r...@gm...> - 2011-03-25 22:26:34
|
Hi Allan, On Fri, Mar 25, 2011 at 11:21 AM, Stephens, Allan < all...@wi...> wrote: > Hi Karthick: > > > > Thanks for the update on the patch. Based on the feedback you’ve provided > I’m concluding the patch is sane and have pushed it to the TIPC 1.7 git > repository. (Peter Butler from PerfTech says the he will do some additional > testing with that patch when he returns from vacation in a week or so; I’m > assuming that he won’t turn up any new issues, but if he does we’ll deal > with them then.) > > > > I’ll look into your report of a rapid bearer disable/enable sequence > causing problems to see if it can be replicated. As you’ve already stated, > this isn’t really a major problem if it truly exists, but it’s still > something that we would want to fix (in TIPC 2.0, if not in TIPC 1.7.7). > > I now think that this issue ought to be fixed in 1.7.7-rc4. The split brain in TIPC on a bearer disable and enable is extremely easy to reproduce. It was first reproduced by a customer using our product that relies heavily on tipc. But he isn't doing the bearer disable/enable step now for various reasons in his script. However I just had about 1 hr at my disposal and wanted to try my hand at reproducing the split brain in TIPC. Turns out that it is extremely easy to reproduce. Have 2 nodes. with say eth0. established. Then set the link tolerance time to say 10 seconds (just for the heck as it isn't required really) Then disable the bearer on one side: tipc-config -bd=eth:eth0 The node on which you disable the bearer would get into RESET_UNKNOWN, lose contact with the peer as expected. But the peer node would have hit a TIMEOUT_EVT on the continuation interval (500 milliseconds is the MAX apparently), and the link state in the peer node becomes WORKING_UNKNOWN. So far so good. Now within the link tolerance abort limit/time, enable the bearer back. tipc-config -be=eth:eth0 Now the node, resets a RESET_MSG on enable bearer, the peer in WORKING_UNKNOWN state receives the RESET_MSG in link_state_event (reset while probing case), resets the link (loses contact with the bearer disabled node), changes state to RESET_RESET, and sends an ACTIVATE_MSG to the node that enabled the bearer. The bearer enabled node then transitions from RESET_UNKNOWN to WORKING_WORKING, establishes contact and sends the STATE_MSG to activate the peer node in RESET_RESET state. Apparently when the peer node lost contact in WORKING_UNKNOWN state above, it set the: n_ptr->block_setup = WAIT_PEER_DOWN | WAIT_NAMES_GONE ; to prevent reestablishing contact with the peer till cleanup, schedules the kernel tasklet to clear off the WAIT_NAMES_GONE bitmask in node_name_purge_complete. Now with the n_ptr->block_setup at WAIT_PEER_DOWN state, the peer in RESET_RESET link state now receives a STATE_MSG from the bearer enabled node that got activated back. However this message is _discarded_ leading to a split brain with the bearer enabled node establishing contact with the peer but the peer apparently always remaining in lost contact or reset_reset state for the link. tipc-config -n after bearer disable and enable on NODE0 shows NODE0 in established contact whilst NODE1 is still in lost contact: [root@NODE0 <ro...@sb...>]# tipc-config -n Neighbors: <1.1.1>: up [root@NODE1 <ro...@sb...>]# tipc-config -n Neighbors: <1.1.2>: down Apparently, this code in tipc_recv_msg results in the STATE_MSG from the bearer enabled node that went to WORKING_WORKING from RESET_UNKNOWN being DROPPED and hence resulting in NODE1 not activating the link from RESET_RESET state to WORKING_WORKING. /* Verify that communication with node is currently allowed */ if ((n_ptr->block_setup & WAIT_PEER_DOWN) && (msg_user(msg) == LINK_PROTOCOL) && (msg_type(msg) == RESET_MSG || msg_type(msg) == ACTIVATE_MSG) && !msg_redundant_link(msg)) n_ptr->block_setup &= ~WAIT_PEER_DOWN; if (n_ptr->block_setup) { -> ends up being DROPPED for STATE_MSG tipc_node_unlock(n_ptr); goto cont; } If I do the following change to clear the WAIT_PEER_DOWN mask for STATE_MSG above, the split brain is resolved and NODE1 reestablishes contact after the reset: In short, tipc seems to behave as expected on a bearer disable/enable without resulting in a split brain :) /* Verify that communication with node is currently allowed */ if ((n_ptr->block_setup & WAIT_PEER_DOWN) && (msg_user(msg) == LINK_PROTOCOL) && (msg_type(msg) == RESET_MSG || msg_type(msg) == ACTIVATE_MSG || msg_type(msg) == STATE_MSG) && !msg_redundant_link(msg)) n_ptr->block_setup &= ~WAIT_PEER_DOWN; if (n_ptr->block_setup) { tipc_node_unlock(n_ptr); goto cont; } Now I don't know if the above fix is safe to be integrated into TIPC and would cause regression for other cases. And you are the best person to take a judgement call for the above fix which resolves the split brain thats very easy to reproduce :) Regards, -Karthick *From:* A.R Karthick [mailto:a.r...@gm...] > *Sent:* Wednesday, March 23, 2011 6:25 PM > > *To:* Stephens, Allan > *Subject:* Re: Proposed TIPC 1.7.7-rc4 patch to correct bug in earlier > "Ignore broadcast acknowledgements ..." patch > > > > Hi Allan, > > On Tue, Mar 22, 2011 at 10:23 AM, A.R Karthick <a.r...@gm...> > wrote: > > > > I meant the latest git. It doesn't apply clean even with that patch. > Already have that since it was released. > > > And I believe that is the cause of the panic in the first place :) I have > already asked our customer to revert my fail-safe patch which had worked > fine for him. And anyway he is testing your patch as I speak as I provided > him the one that applied clean. We will know in a day. > > > The patch seems to work. Atleast in 1 node he reported that he didn't see > the panic. So its good. However he reported another issue which isn't > serious for now as there is a workaround but it appears to expose a bug in > 1.7.7-rc4 or TIPC discovery which is resulting in split brain when a bearer > is disabled and immediately reenabled on 1 NODE (within the tipc link > tolerance time) > > I don't have much information on this but in short this was the state he > had in the logs: (the logs are confusing because of the timestamps which are > 1 hr. off and were also bounced). > > (2 machines connected via cross-over: sbx103 and sbx31 have ~1 hr gap > approximately): > > From sbx103: > > *Mar 23 14:23:27 sbx103 setupNP.sh: Failed to program NP, nptool returned > 255, rebooting the server to recover from error* > > *Mar 23 15:25:32* sbx103 kernel: imklog 3.18.6, log source = /proc/kmsg > started. > > Mar 23 15:25:32 sbx103 rsyslogd: [origin software="rsyslogd" > swVersion="3.18.6" x-pid="3573" x-info="http://www.rsyslog.com"] restart > > > > Mar 23 15:25:59 sbx103 kernel: [ 65.972770] TIPC: Activated (version > 1.7.7-rc4 compiled Mar 18 2011 13:34:25) > > : > > *Mar 23 15:26:00 sbx103 kernel: [ 68.909315] TIPC: Established link > <1.1.2:bond0-1.1.1:bond0> on network plane A* > > > > Mar 23 15:41:42 sbx103 kernel: [ 1580.418058] TIPC: Disabling bearer > <eth:bond0> > > Mar 23 15:41:42 sbx103 kernel: [ 1580.418064] TIPC: Lost link > <1.1.2:bond0-1.1.1:bond0> on network plane A > > Mar 23 15:41:42 sbx103 kernel: [ 1580.418148] TIPC: Lost contact with > <1.1.1> > > Mar 23 15:41:42 sbx103 kernel: [ 1580.426061] TIPC: Enabled bearer > <eth:bond0>, discovery domain <1.1.0>, priority 10 > > Mar 23 15:41:42 sbx103 kernel: [ 1580.898375] TIPC: Established link > <1.1.2:bond0-1.1.1:bond0> on network plane A > > Mar 23 15:41:47 sbx103 kernel: [ 1588.012302] drbd0: user_isp( 0 -> 1 ) > > Mar 23 15:41:47 sbx103 kernel: [ 1588.176513] SBX NRS module loaded. > > Mar 23 15:41:47 sbx103 kernel: [ 1588.180711] SBX TCP Redundancy module > loaded > > Mar 23 15:41:48 sbx103 kernel: [ 1588.428553] TIPC: Resetting link > <1.1.2:bond0-1.1.1:bond0>, requested by peer while probing > > Mar 23 15:41:48 sbx103 kernel: [ 1588.428553] TIPC: Lost link > <1.1.2:bond0-1.1.1:bond0> on network plane A > > Mar 23 15:41:48 sbx103 kernel: [ 1588.432555] TIPC: Lost contact with > <1.1.1> > > > > : > > Mar 23 15:41:48 sbx103 kernel: [ 1588.428553] TIPC: Resetting link > <1.1.2:bond0-1.1.1:bond0>, requested by peer while probing > > : > > Mar 23 15:58:02 sbx103 /opt/sonus/sbx/scripts/loadConfig.pl: Taking backup > in /opt/sonus/sbx/.configRestoreStaging/beforeRestore(CDB)... > > : > > Mar 23 15:59:58 sbx103 /opt/sonus/sbx/scripts/loadConfig.pl: Starting > Active... > > Mar 23 15:59:59 sbx103 kernel: [ 6016.485441] TIPC: Disabling bearer > <eth:bond0> > > Mar 23 15:59:59 sbx103 kernel: [ 6016.491483] TIPC: Enabled bearer > <eth:bond0>, discovery domain <1.1.0>, priority 10 > > Mar 23 15:59:59 sbx103 kernel: [ 6016.892589] TIPC: Established link > <1.1.2:bond0-1.1.1:bond0> on network plane A > > > > > > On SBX31: > > > > Mar 23 10:53:44 sbx31 kernel: [ 0.000000] Initializing cgroup subsys > cpuset > > : > > : > > Mar 23 14:18:54 sbx31 kernel: [54840.044003] SBX NRS module unloaded. > > Mar 23 14:18:54 sbx31 kernel: [54840.106331] SBX TCP Redundancy module > unloaded > > Mar 23 14:20:00 sbx31 kernel: [54921.563407] drbd0: peer_isp( 0 -> 1 ) > > Mar 23 14:23:32 sbx31 kernel: [55189.890178] drbd0: peer( Primary -> > Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) > peer_isp( 1 -> 0 ) > > Mar 23 14:23:32 sbx31 kernel: [55189.894180] drbd0: asender terminated > > Mar 23 14:23:32 sbx31 kernel: [55189.976223] drbd0: Terminating asender > thread > > Mar 23 14:23:32 sbx31 kernel: [55189.985769] drbd0: Connection closed > > Mar 23 14:23:32 sbx31 kernel: [55189.985774] drbd0: conn( NetworkFailure -> > Unconnected ) > > Mar 23 14:23:32 sbx31 kernel: [55189.986325] drbd0: receiver terminated > > Mar 23 14:23:32 sbx31 kernel: [55189.986325] drbd0: Restarting receiver > thread > > Mar 23 14:23:32 sbx31 kernel: [55189.989776] drbd0: receiver (re)started > > Mar 23 14:23:32 sbx31 kernel: [55190.472951] drbd0: conn( Unconnected -> > WFConnection ) > > *Mar 23 14:23:32 sbx31 kernel: [55190.998516] TIPC: Resetting link > <1.1.1:bond0-1.1.2:bond0>, peer not responding* > > Mar 23 14:23:32 sbx31 kernel: [55190.998625] TIPC: Lost link > <1.1.1:bond0-1.1.2:bond0> on network plane A > > Mar 23 14:23:32 sbx31 kernel: [55190.998625] TIPC: Lost contact with > <1.1.2> > > : > > : > > *Mar 23 14:26:02 sbx31 kernel: [55382.915309] TIPC: Established link > <1.1.1:bond0-1.1.2:bond0> on network plane A* > > : > > : > > Mar 23 14:41:44 sbx31 kernel: [56909.655509] TIPC: Resetting link > <1.1.1:bond0-1.1.2:bond0>, requested by peer while probing > > : > > : > > Mar 23 14:41:50 sbx31 kernel: [56919.533777] TIPC: Established link > <1.1.1:bond0-1.1.2:bond0> on network plane A > > : > > : > > Mar 23 15:00:01 sbx31 kernel: [61684.311304] TIPC: Resetting link > <1.1.1:bond0-1.1.2:bond0>, requested by peer while probing > > > > > > [ro...@sb... openclovis]# tipc-config -n > > Neighbors: > > <1.1.1>: up > > > > [ro...@sb... openclovis]# tipc-config -n > > Neighbors: > > <1.1.2>: down > > It looks as if the tipc link RESET_MSG on bearer disable and ACTIVATE_MSG > on bearer enable went out of order from sbx103 to sbx31. > > Since it isn't a blocker for us now as neither me or the customer has the > bandwidth to reproduce this and try out patches as of now, he is presently > not doing the bearer disable/enable step (which should avoid this issue) > that he apparently had introduced to work around a bcast congestion bug in > 1.7.6 which is resolved in 1.7.7-rc4 but ends up showing a split brain now > :) > > Regards, > -Karthick > > > > *From:* A.R Karthick [mailto:a.r...@gm...] > *Sent:* Tuesday, March 22, 2011 12:41 PM > *To:* Stephens, Allan > *Subject:* Re: Proposed TIPC 1.7.7-rc4 patch to correct bug in earlier > "Ignore broadcast acknowledgements ..." patch > > > > Hi Allan, > Your patch doesn't apply clean on top of 1.7.7-rc4 (latest git). You seem > to have some local changes which causes the discrepancy. > Anyway I have applied it manually for now. Find the one that applies clean > attached. > > Regards, > -Karthick > > On Mon, Mar 21, 2011 at 2:09 PM, Stephens, Allan < > all...@wi...> wrote: > > A.R. Karthick has recently identified an apparent problem with the recently > published TIPC 1.7 patch "Ignore broadcast acknowledgements that are > out-of-range". This email contains a supplementary patch that is designed to > correct the bug introduced by that patch. > > Any TIPC 1.7.7-rc4 user who has incorporated the defective patch and wishes > to test this corrective patch can apply it to their system directly; there > is no need to remove the previous patch. Feedback on this new patch -- > either positive or negative -- is greatly appreciated. Once I've got some > assurance that the corrective patch really does what it's supposed to, and > doesn't introduce any new issues, I'll make it generally available in the > TIPC 1.7 git repository at SourceForge. > > Regards, > Al > > ............... > > This patch revises the checks that were added to TIPC's broadcast link by > the > patch "Ignore broadcast acknowledgements that are out-of-range" to > prevent a failed logic assertion from triggering a kernel BUG report. > > This revision ensures that any unsent messages in the broadcast link's > transmit queue are properly purged when contact with TIPC's only > neighboring node is lost, thereby restoring the clean up capability > that existed before the new broadcast link checks were added. The checks > are now performed only if TIPC is still in contact with the acknowledging > node. Conversely, if contact with the acknowledging node has been lost > the checks are unnecessary and the broadcast link determines which > messages in the transmit queue need to be cleaned up, based on whether > or not the unsent messages (if any) can still be delivered. > > Signed-off-by: Allan Stephens <all...@wi...> > --- > net/tipc/tipc_bcast.c | 47 > +++++++++++++++++++++++++---------------------- > 1 files changed, 25 insertions(+), 22 deletions(-) > > diff --git a/net/tipc/tipc_bcast.c b/net/tipc/tipc_bcast.c > index c07fc4c..c53dcfb 100644 > --- a/net/tipc/tipc_bcast.c > +++ b/net/tipc/tipc_bcast.c > @@ -218,32 +218,37 @@ void tipc_bclink_acknowledge(struct tipc_node *n_ptr, > u32 acked) > > spin_lock_bh(&bc_lock); > > - /* > - * Invalid number indicates that all queued messages are > acknowledged > - * (used when contact with specified node has been lost) > - */ > - > - if (acked == INVALID_LINK_SEQ) > - acked = bcl->fsm_msg_cnt; > - > - /* > - * Bail out if no unacknowledged messages in queue > - * (either the node is acknowledging messages it has previously > ack'd > - * or the ack value is invalid and should be ignored) > - */ > + /* Bail out if tx queue is empty (no clean up is required) */ > > crs = bcl->first_out; > if (!crs) > goto exit; > > - /* > - * Validate sequence number to ensure it corresponds to a message > - * that has been sent and not yet acknowledged > - */ > + /* Determine which messages need to be acknowledged */ > > - if (less(acked, buf_seqno(crs)) || less(bcl->fsm_msg_cnt, acked) || > - less_eq(acked, n_ptr->bclink.acked)) { > - goto exit; > + if (acked == INVALID_LINK_SEQ) { > + > + /* > + * Contact with specified node has been lost, so need to > + * acknowledge sent messages only (if other nodes still > exist) > + * or both sent and unsent messages (otherwise) > + */ > + > + if (bclink->bcast_nodes.count) > + acked = bcl->fsm_msg_cnt; > + else > + acked = bcl->next_out_no; > + } else { > + > + /* > + * Bail out if specified sequence number does not > correspond > + * to a message that has been sent and not yet acknowledged > + */ > + > + if (less(acked, buf_seqno(crs)) || > + less(bcl->fsm_msg_cnt, acked) || > + less_eq(acked, n_ptr->bclink.acked)) > + goto exit; > } > > /* Skip over packets that node has previously acknowledged */ > @@ -259,8 +264,6 @@ void tipc_bclink_acknowledge(struct tipc_node *n_ptr, > u32 acked) > > if (crs != bcl->next_out) > bcbuf_decr_acks(crs); > - else if (bclink->bcast_nodes.count) > - break; > else { > bcbuf_set_acks(crs, 0); > bcl->next_out = next; > -- > 1.7.1 > > > > > -- > Software is like sex: it's better when it's free. > -Linus Torvalds > > A.R.Karthick > http://github.com/karthick18 > > > > > -- > Software is like sex: it's better when it's free. > -Linus Torvalds > > A.R.Karthick > http://github.com/karthick18 > > > > > -- > Software is like sex: it's better when it's free. > -Linus Torvalds > > A.R.Karthick > http://github.com/karthick18 > > > > > -- > Software is like sex: it's better when it's free. > -Linus Torvalds > > A.R.Karthick > http://github.com/karthick18 > -- Software is like sex: it's better when it's free. -Linus Torvalds A.R.Karthick http://github.com/karthick18 |