Re: [javagroups-users] Merging after network split
Brought to you by:
belaban
From: <eri...@hc...> - 2007-02-15 22:03:54
|
Bela, I guess your fix was for the *sending* of MERGE responses. The problem I see is about *receiving* the MERGE request (at member C). The method handleDataReceived() in UNICAST will discard the received MERGE request if the member is in the previous_members list and doesn't have a seq_nr 1, which it hadn't, so the request dit not travel up the stack! Looks like a problem to me! Eric Op Do, 15 februari, 2007 12:09 pm schreef Bela Ban: > I fixed this some time ago by sending down an Event.ENABLE_UNICASTS_TO > on a merge (in CoordGmsImpl.handleMergeRequest()). This means that the > members should actually be able to see each other and not discard their > unicast MERGE messages. > > I'll look at JGRP-357 in 2.5, but can't promise it will be fixed in 2.5. > At first glance this might be more than a simple bug fix. > > > > eri...@hc... wrote: >> Hi, >> >> >> We have noticed the following in one of our test scenarios: >> >> >> Intitally we have a JGroups group of 3 member {A, B, C}, each on a >> different server. Then, the network got split and 2 subgroups formed: {A, >> B} and {C}, A is coordinator and C, each in their own group. This >> is ok. Then the network is connected again and the MERGE protocol (in our >> case MERGE3) initiates a merge (MERGE_REQ). The following now happens: >> >> - A sends a MERGE_REQ (with seqno > 1 in our case) to C. >> - In Unicast (at member C), the previous_members list contains member A >> so the MERGE_REQ is discarded by UNICAST and A starts retransmitting! >> See >> the following code snippet from UNICAST: >> >> if(previous_members.contains(sender)) { // we don't want to see messages >> from departed members if(seqno > DEFAULT_FIRST_SEQNO) { if(trace) >> log.trace("discarding message " + seqno + " from previous member " + >> sender); return false; // don't ack this message so the sender keeps >> resending it ! } >> >> >> - This does not stop until retransmission ends (which in 2.4 is not >> the case) or (I think), a new VIEW is installed. In our case (we have a >> fix for the retransmission) retransmission stops after 15 minutes and >> clears the connection and then a MERGE_REQ is sent again (due to MERGE3 >> trigger) from A to C with seqno 1. - C does NOT discard this MERGE_REQ >> (because it has seqno 1 !) >> and the MERGE is initiated and completed. >> >> So the situation at the end is OK but this is due to our fix in UNICAST >> to stop retransmitting after X minutes and clearing the connection to >> that member. Otherwise, retransmission would not have stopped and more >> and more MERGE_REQ messages would be retransmitted. >> >> We have the following questions: >> >> >> * Is this behaviour as expected? >> * Is it ok to discard *all* messages in UNICAST if the sender is in >> the previous_members list? Seems to me that a MERGE_REQ should not be >> discarded. * We opened a Jira issue for the retransmission issue, >> http://jira.jboss.com/jira/browse/JGRP-357 , will this be fixed in >> 2.4.1 or not until 2.5? >> >> >> Thanks and kind regards, >> >> >> Eric. >> > > -- > Bela Ban > Lead JGroups / JBoss Clustering team > JBoss - a division of Red Hat > > |