Re: [javagroups-users] Merge problem
Brought to you by:
belaban
From: Victor <dre...@ma...> - 2009-02-26 20:04:10
|
Bela, sorry, I have one more question: c) after being shunned, the channel is automatically closed - is this "be design"? In such situation AUTO_RECONNECT option does not work (can not reconnect after being closed), so I need to write my own reconnection logic - creating a new JChannel instance and calling connect(). Victor N Victor wrote: > Bela, > > things seems to be much better than I thought :) > > let me continue: > > 5) I waited until the view on node A updated (node A has a greater FD > timeout). > Now node A was shunned. > Strange thing here - I did not see "view changed" event on node A at > this moment; it was just shunned and I saw the following: > Feb 26, 2009 7:30:37 PM org.jgroups.protocols.pbcast.GMS > castViewChangeWithDest > WARNING: 192.168.1.1:30001 failed to collect all ACKs (1) for view > [192.168.1.1:30001|2] [192.168.1.2:30001] after 2000ms, missing ACKs > from [192.168.1.2:30001] (received=[192.168.1.1:30001]), > local_addr=192.168.1.1:30001 > > 6) after a delay a new Channel was connected on node A (my code works > this way), > first, a singleton cluster was created on node A (node A saw itself > only; node B saw itself only) > > 7) after about 1-1.5 minutes, finally, view was merged - nodes A and B > see each other now and send messages to each other! > > So seems everything works, but some questions still interest me: > a) why node A did not receive "view changed" event on the step 5? > b) why node A did not see node B instantly on step 6? > > Victor N > > > Victor wrote: >> Hello Bela, >> >> I did a test to check how the system reacts on network disconnection. >> I have a typical TCP config with 2 nodes: >> >> 1) I started nodes A (192.168.1.1) and B (192.168.1.2) on 2 computers; >> but node A has a greater FD timeout than B >> >> 2) I switched off my network cable, waited a little, then switched the >> network on again >> >> 3) view on node B was updated (A is removed from it), view on node A >> is not updated - because FD timeout is greater, it's ok >> >> 4) after that the nodes can not merge: >> node A says: >> Feb 26, 2009 6:01:17 PM >> org.jgroups.protocols.pbcast.CoordGmsImpl$MergeTask run >> WARNING: Merge aborted. Merge leader did not get MergeData from all >> subgroup coordinators [192.168.1.2:30001, 192.168.1.1:30001] >> Feb 26, 2009 6:01:17 PM org.jgroups.protocols.pbcast.CoordGmsImpl >> handleMergeCancelled >> WARNING: merge was supposed to be cancelled at merge participant >> 192.168.1.1:30001 (merge_id=[192.168.1.1:30001|1235667667861]), but it >> is not since merge ids do not match >> received msg from 192.168.1.1:30001: hello world >> >> and node B says: >> 26/02/2009 20:01:29| WARN | >> org.jgroups.protocols.pbcast.NAKACK.handleMessage(): >> 192.168.1.2:30001] discarded message from non-member >> 192.168.1.1:30001, my view is [192.168.1.2:30001|2] [192.168.1.2:30001] >> >> What can I do here to help my system to merge? >> Why don't the nodes merge or shun? I have shun=true in FD and GMS, >> auto-reconnection of Channel is disabled. >> >> Thanks, >> Victor N >> >> ------------------------------------------------------------------------------ >> >> Open Source Business Conference (OSBC), March 24-25, 2009, San >> Francisco, CA >> -OSBC tackles the biggest issue in open source: Open Sourcing the >> Enterprise >> -Strategies to boost innovation and cut costs with open source >> participation >> -Receive a $600 discount off the registration fee with the source >> code: SFAD >> http://p.sf.net/sfu/XcvMzF8H >> _______________________________________________ >> javagroups-users mailing list >> jav...@li... >> https://lists.sourceforge.net/lists/listinfo/javagroups-users >> >> >> >> > |