[javagroups-users] Crash detection takes too long and one more question
Brought to you by:
belaban
From: Dima G. <di...@ma...> - 2004-07-28 11:34:03
|
Dear list members , I am creating a failover application with the following stack: UDP(bind_addr={0};mcast_addr={1};mcast_port={2};ip_ttl=32;\ mcast_send_buf_size=150000;mcast_recv_buf_size=80000):\ PING(timeout=1250;num_initial_members=2):\ FD_SOCK:\ FD(timeout=1000;max_tries=2):\ VERIFY_SUSPECT(timeout=500):\ pbcast.NAKACK(gc_lag=50;retransmit_timeout=300):\ UNICAST(timeout=1000):\ pbcast.STABLE(desired_avg_gossip=5000):\ FRAG(frag_size=4096;down_thread=false;up_thread=false):\ pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;\ shun=false;print_local_addr=true):\ pbcast.STATE_TRANSFER In 100Mb switched network , multicast is supported. First question: It works just fine , most of the time. The failure detection of crashed members is being detected after 2 secs (most of the times) , but there are times when it takes up to 50 sec. !!! How can it happen ? Second question : I am using DistributedHashTable to store info across the network. While trying to "remove" an entry from the table , the thread "hangs" there forever and does not go on. (deadlock). To overcome that problem I use _remove (local remove) on each member (which is ugly). Is it some known issue ? Thanks in advance for your kind help. |