From: Emmanuel C. <ma...@fr...> - 2010-03-24 14:41:30
|
Hi Seby, Sorry for the late reply, I have been very busy these past days. This seems to be a JGroups issue that could probably be better answered by Bela Ban on the JGroups mailing list. I have seen emails these past days on the list with people having similar problem. I would recommend that you post an email on the JGroups mailing list with your JGroups configuration and the messages you see regarding MERGE failing. Keep me posted Emmanuel > Also, here is the error which I see from the logs: > > 2010-03-22 08:31:15,912 DEBUG protocols.pbcast.GMS Merge leader 10.10.10.23:39729 expects 2 responses, so far got 1 responses > 2010-03-22 08:31:15,913 DEBUG protocols.pbcast.GMS Merge leader 10.10.10.23:39729 waiting 382 msecs for merge responses > 2010-03-22 08:31:16,313 DEBUG protocols.pbcast.GMS At 10.10.10.23:39729 cancelling merge due to timer timeout (5000 ms) > 2010-03-22 08:31:16,314 DEBUG protocols.pbcast.GMS cancelling merge (merge_id=[10.10.10.23:39729|1269261071286]) > 2010-03-22 08:31:16,316 DEBUG protocols.pbcast.GMS resumed ViewHandler > 2010-03-22 08:31:16,317 DEBUG protocols.pbcast.GMS Merge leader 10.10.10.23:39729 expects 2 responses, so far got 0 responses > 2010-03-22 08:31:16,317 DEBUG protocols.pbcast.GMS Merge leader 10.10.10.23:39729 collected 0 merge response(s) in 5027 ms > 2010-03-22 08:31:16,318 WARN protocols.pbcast.GMS Merge aborted. Merge leader did not get MergeData from all subgroup coordinators [10.10.10.33:38822, 10.10.10.23:39729] > > -----Original Message----- > From: Francis, Seby > Sent: Monday, March 22, 2010 1:03 PM > To: 'Sequoia general mailing list' > Cc: seq...@li... > Subject: RE: [Sequoia] Failure detection > > Hi Emmanuel, > > I've updated my jgroups to the version which you have mentioned, but I still see the issue with Merging the groups. One of the controller lost track after the failure and won't merge. Can you please give me a hand to figure out where it goes wrong. I've the debug logs. Shall I send the logs as a zip file. > > Thanks, > Seby. > > -----Original Message----- > From: seq...@li... [mailto:seq...@li...] On Behalf Of Emmanuel Cecchet > Sent: Thursday, March 18, 2010 10:22 PM > To: Sequoia general mailing list > Cc: seq...@li... > Subject: Re: [Sequoia] Failure detection > > Hi Seby, > > I looked into the mailing list archive and this version of JGroups has a > number of significant bugs. An issue was filed > (http://forge.continuent.org/jira/browse/SEQUOIA-1130) and I fixed it > for Sequoia 4. Just using a drop in replacement for JGroups core for > Sequoia 2.10.10 might work. You might have to update Hedera jars as well > but that could work with the old one too. > > Let me know if the upgrade does not work > Emmanuel > > >> Thanks for your support!! >> >> I'm using jgroups-core.jar Version 2.4.2 which came with >> "sequoia-2.10.10". My solaris test servers have only single interface >> and I'm using the same ip for both group & db/client communications. I >> ran a test again removing "*STATE_TRANSFER*" and attached the logs. At >> around 13:36, I took the host1 interface down and opened it around >> 13:38. After I opened the interface, and when I ran the show >> controllers on console, host1 showed both controllers while host2 >> showed its own name in the member list. >> >> Regards, >> >> Seby. >> >> -----Original Message----- >> Hi Seby, >> >> Welcome to the wonderful world of group communications! >> >> >>> I've tried various FD options and could not get it working when one >>> >> of the hosts fail. I can see the message 'A leaving group' on live >> controller B when I shutdown the interface of A. This is working as >> expected and the virtual db is still accessible/writable as the >> controller B is alive. But when I open the interface on A, the >> controller A shows (show controllers) that the virtual-db is hosted by >> controllers A & B while controller B just shows B. And the data >> inserted into the vdb hosted by controller B is NOT being played on A. >> This will cause inconsistencies in the data between the virtual-dbs. >> Is there a way, we can disable the backend if the network goes down, >> so that I can recover the db using the backup? >> >> >> There is a problem with your group communication configuration if >> controllers have different views of the group. That should not happen. >> >> >>> I've also noticed that in some cases, if I take one of the host >>> >> interface down, both of them thinks that the other controller failed. >> This will also create issues. In my case, I only have two controllers >> hosted. Is it possible to ping a network gateway? That way the >> controller know that it is the one which failed and can disable the >> backend. >> >> >> The best solution is to use the same interface for group communication >> and client/database communications. If you use a dedicated network for >> group communications and this network fails, you will end up with a >> network partition and this is very bad. If all communications go >> through the same interface, when it goes down, all communications are >> down and the controller will not be able to serve stale data. >> >> You don't need STATE_TRANSFER as Sequoia has its own state transfer >> protocol when a new member joins a group. Which version of JGroups are >> you using? Could you send me the log with JGroups messages that you >> see on each controller by activating them in log4j.properties. I would >> need the initial sequence when you start the cluster and the messages >> you see when the failure is detected and when the failed controller >> joins back. There might be a problem with the timeout settings of the >> different component of the stack. >> >> Keep me posted with your findings >> >> Emmanuel >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Sequoia mailing list >> Se...@li... >> http://forge.continuent.org/mailman/listinfo/sequoia >> > > > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: ma...@fr... Skype: emmanuel_cecchet |