Hard for me to tell what's going on, without either being able to
reproduce this, or having access to logs. The former being much better
than the latter.
Your props look a bit weird, I would correct the following (and I
suggest using XML configs rather than string-based ones):
* UDP.loopback should be true, or else members don't get their own
suspicion messages. This might be the root problem...
* UDP.thread_naming_pattern cannot be "jgroups", but must be either
"C", "L", or "CL"
* GMS.view_ack_collection_timeout=10000: unusually high timeout
* GMS.join_timeout=11000: why such a high value ?
If you could reproduce this, and let me know how to do it (even if it
takes hours) that would be useful !
david.forget@... wrote:
> Hi, we are starting testing JGroups 2.8GA and got the following, the
> system is stable for hours and suddenly the coordinator is rejected
> from the cluster by the other nodes and could not join back normally.
> We have been able to reproduce it several times with always the same
> behavior; nodes are not able to join back the cluster if excluded.
>
> Note: This is running under VMware, we look at the system and it
> stable low CPU and low Memory usage. Using JGroups 2.4 (with different
> JGroups properties) the system is stable for weeks.
>
>
> JGroups Version: 2.8GA
>
> We are setting all JVMs with :
> -Djava.net.preferIPv4Stack=true
> -Djgroups.bind_addr=10.4.72.XX
>
> JGroups Properties:
> UDP(bind_addr=10.4.72.XX;enable_diagnostics=false;mcast_addr=228.8.8.8;mcast_port=58315;ip_mcast=true;discard_incompatible_packets=true;loopback=false;enable_bundling=false;thread_naming_pattern=jgroups):PING(timeout=6000;num_initial_members=3;break_on_coord_rsp=false;num_ping_requests=2):MERGE2(max_interval=30000;min_interval=10000):FD_SOCK():FD_ALL(interval=5000;timeout=16000):VERIFY_SUSPECT(timeout=2000;num_msgs=2):BARRIER():pbcast.NAKACK(use_stats_for_retransmission=false;exponential_backoff=150;use_mcast_xmit=true;discard_delivered_msgs=true):UNICAST():pbcast.STABLE(stability_delay=2000;desired_avg_gossip=30000;max_bytes=100000):pbcast.GMS(view_ack_collection_timeout=10000;join_timeout=11000;print_local_addr=true;view_bundling=true):FRAG2(frag_size=48000):pbcast.STATE_TRANSFER()
>
>
--
Bela Ban
Lead JGroups / Clustering Team
JBoss
|