[jgroups-users] JGRP-1755 fixes Unicast 3 but what about TCP?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

We see the fix in unicast3 in github for JGRP-1755 issue, but we are using 
pbcast.gms, pbcast.nakack, we don't have UNICAST3 Configured in our 
implementation of  3.3.0.  We are using TCP.

Yet our cluster member cannot rejoin, ultimately we end up with the same member 
joined 3 or 4 times ( ip's redacted but we see the ip for node c listed multiple 
times)

Hundreds of these messages in logs:
INFO  [.jgroups.MuxRpcDispatcherMgr] suspect:1d8a6d91-2833-7238-22e0-72856922c78a
  WARN  [org.jgroups.protocols.TCP] nodeA:7600: no physical address for 
1d8a6d91-2833-7238-22e0-72856922c78a, dropping message
WARN  [org.jgroups.protocols.TCP] nodeA:7600: logical address cache didn't 
contain all physical address, sending up a discovery request

Then we see the same node readded multiple times into the view
view [nodeA:7600|] after 5000ms, missing ACKs from [nodeA:7600, nodeB:7600, 
nodeC:7600, nodeC:7600, nodeC:7600, nodeC:7600]

Any suggestions?

Here is our config:

<config>
     <TCP
             recv_buf_size="20000000"
             send_buf_size="640000"
             loopback="true"
             max_bundle_size="64000"
             max_bundle_timeout="30"
             bind_port="${cluster.bind.port}"
             use_send_queues="true"
             sock_conn_timeout="300"

             thread_pool.enabled="true"
             thread_pool.min_threads="4"
             thread_pool.max_threads="16"
             thread_pool.keep_alive_time="8000"
             thread_pool.queue_enabled="false"
             thread_pool.queue_max_size="100"
             thread_pool.rejection_policy="run"

             oob_thread_pool.enabled="true"
             oob_thread_pool.min_threads="4"
             oob_thread_pool.max_threads="16"
             oob_thread_pool.keep_alive_time="8000"
             oob_thread_pool.queue_enabled="false"
             oob_thread_pool.queue_max_size="100"
             oob_thread_pool.rejection_policy="run"
             ${BIND_ADDRESS_DIRECTIVE}/>
     <TCPPING timeout="3000"
              initial_hosts="${cluster.tcp.discovery.initial.hosts}"
              port_range="0"
              num_initial_members="2"/>
     <MERGE2 max_interval="100000" min_interval="20000"/>
     <FD_SOCK start_port="${cluster.failure.detection.bind.port}" 
${BIND_ADDRESS_DIRECTIVE}/>
     <FD timeout="10000" max_tries="5"/>
     <VERIFY_SUSPECT timeout="1500"/>
     <BARRIER/>
     <pbcast.NAKACK
                    use_mcast_xmit="false"
                    retransmit_timeout="300,600,1200,2400,4800"
                    discard_delivered_msgs="false"/>
     <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"  
max_bytes="400000"/>
     ${ENCRYPT_TAG}
     <AUTH auth_class="org.jgroups.auth.MD5Token" 
auth_value="${cluster.auth.pwd}" token_hash="MD5"/>
     <pbcast.GMS print_local_addr="true" join_timeout="3000"
                 view_bundling="true" view_ack_collection_timeout="5000"/>
     <FRAG2 frag_size="60000"/>
     <pbcast.STATE_TRANSFER/>
</config>