[jgroups-users] JGRP-1755 fixes Unicast 3 but what about TCP?
Brought to you by:
belaban
|
From: Ron G. <gon...@gm...> - 2014-06-19 20:12:38
|
We see the fix in unicast3 in github for JGRP-1755 issue, but we are using
pbcast.gms, pbcast.nakack, we don't have UNICAST3 Configured in our
implementation of 3.3.0. We are using TCP.
Yet our cluster member cannot rejoin, ultimately we end up with the same member
joined 3 or 4 times ( ip's redacted but we see the ip for node c listed multiple
times)
Hundreds of these messages in logs:
INFO [.jgroups.MuxRpcDispatcherMgr] suspect:1d8a6d91-2833-7238-22e0-72856922c78a
WARN [org.jgroups.protocols.TCP] nodeA:7600: no physical address for
1d8a6d91-2833-7238-22e0-72856922c78a, dropping message
WARN [org.jgroups.protocols.TCP] nodeA:7600: logical address cache didn't
contain all physical address, sending up a discovery request
Then we see the same node readded multiple times into the view
view [nodeA:7600|] after 5000ms, missing ACKs from [nodeA:7600, nodeB:7600,
nodeC:7600, nodeC:7600, nodeC:7600, nodeC:7600]
Any suggestions?
Here is our config:
<config>
<TCP
recv_buf_size="20000000"
send_buf_size="640000"
loopback="true"
max_bundle_size="64000"
max_bundle_timeout="30"
bind_port="${cluster.bind.port}"
use_send_queues="true"
sock_conn_timeout="300"
thread_pool.enabled="true"
thread_pool.min_threads="4"
thread_pool.max_threads="16"
thread_pool.keep_alive_time="8000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="run"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="4"
oob_thread_pool.max_threads="16"
oob_thread_pool.keep_alive_time="8000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="run"
${BIND_ADDRESS_DIRECTIVE}/>
<TCPPING timeout="3000"
initial_hosts="${cluster.tcp.discovery.initial.hosts}"
port_range="0"
num_initial_members="2"/>
<MERGE2 max_interval="100000" min_interval="20000"/>
<FD_SOCK start_port="${cluster.failure.detection.bind.port}"
${BIND_ADDRESS_DIRECTIVE}/>
<FD timeout="10000" max_tries="5"/>
<VERIFY_SUSPECT timeout="1500"/>
<BARRIER/>
<pbcast.NAKACK
use_mcast_xmit="false"
retransmit_timeout="300,600,1200,2400,4800"
discard_delivered_msgs="false"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="400000"/>
${ENCRYPT_TAG}
<AUTH auth_class="org.jgroups.auth.MD5Token"
auth_value="${cluster.auth.pwd}" token_hash="MD5"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
view_bundling="true" view_ack_collection_timeout="5000"/>
<FRAG2 frag_size="60000"/>
<pbcast.STATE_TRANSFER/>
</config>
|