Re: [Sequoiadb-discuss] [Sequoia] Failure detection

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Seby,

Welcome to the wonderful world of group communications!
> 	I've tried various FD options and could not get it working when one of the hosts fail. I can see the message 'A leaving group' on live controller B when I shutdown the interface of A. This is working as expected and the virtual db is still accessible/writable as the controller B is alive. But when I open the interface on A, the controller A shows (show controllers) that the virtual-db is hosted by controllers A & B while controller B just shows B. And the data inserted into the vdb hosted by controller B is NOT being played on A. This will cause inconsistencies in the data between the virtual-dbs. Is there a way, we can disable the backend if the network goes down, so that I can recover the db using the backup?
>   
There is a problem with your group communication configuration if 
controllers have different views of the group. That should not happen.
> 	I've also noticed that in some cases, if I take one of the host interface down, both of them thinks that the other controller failed. This will also create issues. In my case, I only have two controllers hosted. Is it possible to ping a network gateway? That way the controller know that it is the one which failed and can disable the backend.
>   
The best solution is to use the same interface for group communication 
and client/database communications. If you use a dedicated network for 
group communications and this network fails, you will end up with a 
network partition and this is very bad. If all communications go through 
the same interface, when it goes down, all communications are down and 
the controller will not be able to serve stale data.
> I've attached my config xml file which I'm using for the above failure test. It would be nice if the failed controller can join back automatically to the group or disable the backend by its own.
>
> Thanks,
> Seby.
> ----------------------------------------------------------------------------------------------------------------
> Start: sequencer.xml file from hostA. The only diff in hostB is the bind_addr.
> ----------------------------------------------------------------------------------------------------------------
> <config>
>     <UDP bind_addr="A"
>          mcast_port="45566" 
>          mcast_addr="228.8.8.9"
>          tos="16"
>          ucast_recv_buf_size="20000000"
>          ucast_send_buf_size="640000"
>          mcast_recv_buf_size="25000000" 
>          mcast_send_buf_size="640000" 
>          loopback="false"
>          discard_incompatible_packets="true"
>          max_bundle_size="64000"
>          max_bundle_timeout="30"
>          use_incoming_packet_handler="true" 
>          use_outgoing_packet_handler="false" 
>          ip_ttl="2" 
>          down_thread="false" up_thread="false"
>          enable_bundling="true"/>
>     <PING timeout="2000"
>           down_thread="false" up_thread="false" num_initial_members="3"/>
>     <MERGE2 max_interval="10000"
>             down_thread="false" up_thread="false" min_interval="5000"/>
>     <FD_SOCK down_thread="false" up_thread="false"/>
>     <FD timeout="2500" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
>    <VERIFY_SUSPECT timeout="1500" down_thread="false"/>
>     <pbcast.NAKACK max_xmit_size="60000"
>                    use_mcast_xmit="false" gc_lag="0"
>                    retransmit_timeout="100,200,300,600,1200,2400,4800"
>                    down_thread="false" up_thread="false"
>                    discard_delivered_msgs="true"/>
>     <UNICAST timeout="300,600,1200,2400,3600"
>              down_thread="false" up_thread="false"/>
>     <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" 
>                    down_thread="false" up_thread="false"
>                    max_bytes="400000"/>
>     <VIEW_SYNC avg_send_interval="60000" down_thread="false" up_thread="false" />
>     <pbcast.GMS print_local_addr="true" join_timeout="3000" 
>                 down_thread="false" up_thread="false"
>                 join_retry_timeout="2000" shun="true" handle_concurrent_startup="true" />
>     <SEQUENCER down_thread="false" up_thread="false" />
>     <FC max_credits="2000000" down_thread="false" up_thread="false"
>            min_threshold="0.10"/>
>     <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/>
> </config>
> ----------------------------------------------------------------------------------------------------------------
> End: sequencer.xml
> ----------------------------------------------------------------------------------------------------------------
>   
You don't need STATE_TRANSFER as Sequoia has its own state transfer 
protocol when a new member joins a group.
Which version of JGroups are you using?
Could you send me the log with JGroups messages that you see on each 
controller by activating them in log4j.properties. I would need the 
initial sequence when you start the cluster and the messages you see 
when the failure is detected and when the failed controller joins back. 
There might be a problem with the timeout settings of the different 
component of the stack.

Keep me posted with your findings
Emmanuel

>
> -----Original Message-----
> From: seq...@li... [mailto:seq...@li...] On Behalf Of Emmanuel Cecchet
> Sent: Tuesday, March 16, 2010 9:21 AM
> To: Sequoia general mailing list
> Cc: seq...@li...
> Subject: Re: [Sequoia] Failure detection
>
> Hi Francis,
>
> When a group communication network failure happens, if no write was 
> pending at the time the failure was detected, the recovery is going to 
> be automatic. If a write was pending during the failure, there is no way 
> to know if the other controller really performed that write properly and 
> it is considered as failed. You then have to start the controller 
> recovery sequence (you can automate the script). The procedure is 
> described in section '8.1 Recover from a controller node failure' of the 
> Sequoia 2.10 management guide (pdf can be found in the doc directory).
>
> Hope this helps
> Emmanuel
>
>   
>> Thank you Emmanuel! I've enabled the FD, FD_SOCK & VERIFY_SUSPECT in sequencer.xml and I could see the insert statements are now working. 
>>
>> I do have another question. After the interface came back up, I see that the backend re-joins the group but, it never replays the insert statements occurred when it was down. Is that supposed to do automatically? Is there a way to make it automated. 
>>
>> Thanks,
>> Seby.
>>
>> -----Original Message-----
>> From: seq...@li... [mailto:seq...@li...] On Behalf Of Emmanuel Cecchet
>> Sent: Saturday, March 13, 2010 12:51 PM
>> To: Sequoia general mailing list
>> Subject: Re: [Sequoia] Failure detection
>>
>> Hi Seby,
>>   
>>     
>>>         I setup sequoia in my lab having two controllers on two Solaris hosts. Each controller has one postgres backed attached. This dbs are in these Solaris servers itself. The controllers use group communication (jgroup) to sync updates/writes. 
>>>  
>>>         For a failure test, I shutdown the interface on one of the host, but the other controller/host never figured this and my INSERT statement started failing. There were no errors I could see in the controller log. As soon I open the interface, I can see the request are being played and the data getting inserted to the db.
>>>  
>>>         Could you please let me know how the jgroup detects the network failures?
>>>     
>>>       
>> This depends on your JGroups configuration.
>> If you are using a TCP based failure detector, the detection will depend 
>> on your operating system TCP settings. Otherwise you should be able to 
>> setup the timeout in your gossip server or udp-based failure detector.
>>
>> Hope this helps
>> Emmanuel
>>
>>   
>>     
>
>
>   

-- 
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: ma...@fr...
Skype: emmanuel_cecchet