[jgroups-users] NAKACK producing continuous warn messages

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

In one of our clusters in the production system the following message is
logged continuously.

2014-06-20 09:32:55 WARN [tid=] [OOB-1,Agent,X]
org.jgroups.protocols.pbcast.NAKACK - (requester=Y, local_addr=X) message
X::1 not found in retransmission table of X:
2014-06-20 09:32:55 WARN [tid=] [OOB-1,Agent,X]
org.jgroups.protocols.pbcast.NAKACK - (requester=Y, local_addr=X) message
X::2 not found in retransmission table of X:

It seems that member Y re-requests messages from member X, but member X is
missing the messages from itself. According to the logs this message has
been logged around 3000 times within a minute and messages up to sequence no
23349 is requested within that minute. 
As a result of this hard disk becomes full and to recover we had to restart
all the nodes. 

What is the recovery solution for this problem without restarting the nodes
and what could be the root cause for the issue?

In production system we use jgroups 2.6.5.GA. In this is there is a way to
change the logging frequency?

Below is the config file we have been using.

<config>
    <UDP
         mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}"
         mcast_port="${jgroups.udp.mcast_port:45588}"
         tos="8"
         ucast_recv_buf_size="20000000"
         ucast_send_buf_size="640000"
         mcast_recv_buf_size="80000"
         mcast_send_buf_size="150000"
         loopback="false"
         discard_incompatible_packets="true"
         ip_ttl="${jgroups.udp.ip_ttl:2}"
         thread_naming_pattern="cl"

         thread_pool.enabled="true"
         thread_pool.min_threads="2"
         thread_pool.max_threads="8"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="true"
         thread_pool.queue_max_size="1000"
         thread_pool.rejection_policy="Run"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="Run"/>

    <PING timeout="2000"
            num_initial_members="3"/>
    <MERGE2 max_interval="10000"
            min_interval="5000"/>
    <FD_SOCK/>
    <FD timeout="1000" max_tries="5"   shun="false"/>
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK gc_lag="50"
                   retransmit_timeout="300,600,1200,2400,4800"
                   />
    <UNICAST timeout="5000"/>
    <pbcast.STABLE desired_avg_gossip="20000"/>
    <VIEW_SYNC avg_send_interval="60000"   />
    <pbcast.GMS print_local_addr="false" join_timeout="5000"
                shun="false"
                view_bundling="true"/>
    <FC max_credits="500000"
                    min_threshold="0.20"/>
    <FRAG2 frag_size="4096"  />

    <pbcast.STATE_TRANSFER  />

</config>

Any help to recover this problem would be really appreciated.

Thanks
Aloka

--
View this message in context: http://jgroups.1086181.n5.nabble.com/NAKACK-producing-continuous-warn-messages-tp10269.html
Sent from the JGroups - General mailing list archive at Nabble.com.