Re: [javagroups-users] Problems with multicast on many machines
Brought to you by:
belaban
From: Jan B. <js...@gm...> - 2007-08-30 10:51:02
|
Thanks Bela! That helped a lot, we are upgrading from 2.2.7 so it was time to look at the stack too ;) Jan S Bela Ban wrote: > I see you use CAUSAL, which we don't support (as a matter of fact, it > was removed in 2.5). Also, you're misisng max_bytes in STABLE. On top > of that, there is no flow control in your stack: flow control is > essential if you're sending lots of messages ! > I recommend use udp.xml which ships with 2.4 and modify it slightly, > e.g. num_initial_members in PING etc. > > Your description is quite vague, but the above tips might solve your > problem. > > Jan Berg wrote: >> Hi all, >> >> I have some problems using JGroups multicast on a 16 machine setup. >> >> During a stress test of the system most messages comes through, but >> sometimes it seems like multicast messages does not get through for some >> time. When I fallback to sending unicast to the members not receiving >> the multicast messages they get them immediately. >> >> After two minutes the multicast messages comes in a burst >> (Is this a timeout somewhere in JGroups? or is it my router?) >> or sometimes I even seem to loose the messages >> >> I have tried using using different versions of JGroups (2.4.0, 2.4.1sp4 >> and 2.5) and also different JDKs (1.5 and 1.6), and I get the best >> results using 2.4.0 and JDK 1.6... >> >> Here's my stack for 2.4: >> UDP(mcast_addr=" + mcast_addr + ";" + >> "mcast_port=" + mcast_port + ";" + >> "ip_ttl=32;bind_addr=" + bind_addr + ";" + >> "mcast_send_buf_size=450000;" + >> "mcast_recv_buf_size=3600000):" + >> "PING(timeout=2000;num_initial_members=10):" + >> "MERGE2(min_interval=5000;max_interval=10000):" + >> "FD:" + >> "VERIFY_SUSPECT(timeout=1500):" + >> "pbcast.NAKACK(max_xmit_size=8192;gc_lag=50;" + >> "retransmit_timeout=600,1200,2400,4800):" + >> "UNICAST(timeout=600,1200,2400,4800):" + >> "pbcast.STABLE(desired_avg_gossip=20000):" + >> "FRAG(frag_size=8192;down_thread=false;up_thread=false):CAUSAL:" + >> "pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;" + >> "shun=false;print_local_addr=false)" + >> ":pbcast.STATE_TRANSFER"; >> >> Any ideas for what could be wrong here? > |