From: Emmanuel C. <ma...@fr...> - 2010-03-16 13:48:09
|
Hi Francis, When a group communication network failure happens, if no write was pending at the time the failure was detected, the recovery is going to be automatic. If a write was pending during the failure, there is no way to know if the other controller really performed that write properly and it is considered as failed. You then have to start the controller recovery sequence (you can automate the script). The procedure is described in section '8.1 Recover from a controller node failure' of the Sequoia 2.10 management guide (pdf can be found in the doc directory). Hope this helps Emmanuel > Thank you Emmanuel! I've enabled the FD, FD_SOCK & VERIFY_SUSPECT in sequencer.xml and I could see the insert statements are now working. > > I do have another question. After the interface came back up, I see that the backend re-joins the group but, it never replays the insert statements occurred when it was down. Is that supposed to do automatically? Is there a way to make it automated. > > Thanks, > Seby. > > -----Original Message----- > From: seq...@li... [mailto:seq...@li...] On Behalf Of Emmanuel Cecchet > Sent: Saturday, March 13, 2010 12:51 PM > To: Sequoia general mailing list > Subject: Re: [Sequoia] Failure detection > > Hi Seby, > >> I setup sequoia in my lab having two controllers on two Solaris hosts. Each controller has one postgres backed attached. This dbs are in these Solaris servers itself. The controllers use group communication (jgroup) to sync updates/writes. >> >> For a failure test, I shutdown the interface on one of the host, but the other controller/host never figured this and my INSERT statement started failing. There were no errors I could see in the controller log. As soon I open the interface, I can see the request are being played and the data getting inserted to the db. >> >> Could you please let me know how the jgroup detects the network failures? >> > This depends on your JGroups configuration. > If you are using a TCP based failure detector, the detection will depend > on your operating system TCP settings. Otherwise you should be able to > setup the timeout in your gossip server or udp-based failure detector. > > Hope this helps > Emmanuel > > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: ma...@fr... Skype: emmanuel_cecchet |