Re: [Sequoiadb-discuss] [Sequoia] Failure detection

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Francis,

When a group communication network failure happens, if no write was 
pending at the time the failure was detected, the recovery is going to 
be automatic. If a write was pending during the failure, there is no way 
to know if the other controller really performed that write properly and 
it is considered as failed. You then have to start the controller 
recovery sequence (you can automate the script). The procedure is 
described in section '8.1 Recover from a controller node failure' of the 
Sequoia 2.10 management guide (pdf can be found in the doc directory).

Hope this helps
Emmanuel

> Thank you Emmanuel! I've enabled the FD, FD_SOCK & VERIFY_SUSPECT in sequencer.xml and I could see the insert statements are now working. 
>
> I do have another question. After the interface came back up, I see that the backend re-joins the group but, it never replays the insert statements occurred when it was down. Is that supposed to do automatically? Is there a way to make it automated. 
>
> Thanks,
> Seby.
>
> -----Original Message-----
> From: seq...@li... [mailto:seq...@li...] On Behalf Of Emmanuel Cecchet
> Sent: Saturday, March 13, 2010 12:51 PM
> To: Sequoia general mailing list
> Subject: Re: [Sequoia] Failure detection
>
> Hi Seby,
>   
>>         I setup sequoia in my lab having two controllers on two Solaris hosts. Each controller has one postgres backed attached. This dbs are in these Solaris servers itself. The controllers use group communication (jgroup) to sync updates/writes. 
>>  
>>         For a failure test, I shutdown the interface on one of the host, but the other controller/host never figured this and my INSERT statement started failing. There were no errors I could see in the controller log. As soon I open the interface, I can see the request are being played and the data getting inserted to the db.
>>  
>>         Could you please let me know how the jgroup detects the network failures?
>>     
> This depends on your JGroups configuration.
> If you are using a TCP based failure detector, the detection will depend 
> on your operating system TCP settings. Otherwise you should be able to 
> setup the timeout in your gossip server or udp-based failure detector.
>
> Hope this helps
> Emmanuel
>
>   

-- 
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: ma...@fr...
Skype: emmanuel_cecchet