Re: [Sequoiadb-discuss] [Sequoia] Failure detection

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Seby,

I looked into the mailing list archive and this version of JGroups has a 
number of significant bugs. An issue was filed 
(http://forge.continuent.org/jira/browse/SEQUOIA-1130) and I fixed it 
for Sequoia 4. Just using a drop in replacement for JGroups core for 
Sequoia 2.10.10 might work. You might have to update Hedera jars as well 
but that could work with the old one too.

Let me know if the upgrade does not work
Emmanuel

> Thanks for your support!!
>
> I’m using jgroups-core.jar Version 2.4.2 which came with 
> “sequoia-2.10.10”. My solaris test servers have only single interface 
> and I’m using the same ip for both group & db/client communications. I 
> ran a test again removing “*STATE_TRANSFER*” and attached the logs. At 
> around 13:36, I took the host1 interface down and opened it around 
> 13:38. After I opened the interface, and when I ran the show 
> controllers on console, host1 showed both controllers while host2 
> showed its own name in the member list.
>
> Regards,
>
> Seby.
>
> -----Original Message-----
> Hi Seby,
>
> Welcome to the wonderful world of group communications!
>
> > I've tried various FD options and could not get it working when one 
> of the hosts fail. I can see the message 'A leaving group' on live 
> controller B when I shutdown the interface of A. This is working as 
> expected and the virtual db is still accessible/writable as the 
> controller B is alive. But when I open the interface on A, the 
> controller A shows (show controllers) that the virtual-db is hosted by 
> controllers A & B while controller B just shows B. And the data 
> inserted into the vdb hosted by controller B is NOT being played on A. 
> This will cause inconsistencies in the data between the virtual-dbs. 
> Is there a way, we can disable the backend if the network goes down, 
> so that I can recover the db using the backup?
>
> >
>
> There is a problem with your group communication configuration if 
> controllers have different views of the group. That should not happen.
>
> > I've also noticed that in some cases, if I take one of the host 
> interface down, both of them thinks that the other controller failed. 
> This will also create issues. In my case, I only have two controllers 
> hosted. Is it possible to ping a network gateway? That way the 
> controller know that it is the one which failed and can disable the 
> backend.
>
> >
>
> The best solution is to use the same interface for group communication 
> and client/database communications. If you use a dedicated network for 
> group communications and this network fails, you will end up with a 
> network partition and this is very bad. If all communications go 
> through the same interface, when it goes down, all communications are 
> down and the controller will not be able to serve stale data.
>
> You don't need STATE_TRANSFER as Sequoia has its own state transfer 
> protocol when a new member joins a group. Which version of JGroups are 
> you using? Could you send me the log with JGroups messages that you 
> see on each controller by activating them in log4j.properties. I would 
> need the initial sequence when you start the cluster and the messages 
> you see when the failure is detected and when the failed controller 
> joins back. There might be a problem with the timeout settings of the 
> different component of the stack.
>
> Keep me posted with your findings
>
> Emmanuel
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Sequoia mailing list
> Se...@li...
> http://forge.continuent.org/mailman/listinfo/sequoia

-- 
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: ma...@fr...
Skype: emmanuel_cecchet