Re: [jgroups-dev] Buffer starvation or something else?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Jack,

I've never seen a "no buffers available" error message. If this is a 
temporary condition, as JGroups retransmits messages until they've been 
delivered everywhere (or by a dest member), this should not cause 
problems, unless the error condition persists.

I ran 2 UUPerf instances on the same (Linux) box and I wasn't able to 
reproduce this. How many instances did you run ? Were they on the same box ?

I suggest try out the following things:
- Run under Linux, not FreeBSD, to see if this changes things. If you 
can reproduce it under Linux, post the exact steps / environment (e.g. 
how many nodes) to reproduce it
- Reduce your buffer sizes in the UDP prot

On 30/01/14 23:28, Jack wrote:
>
> I hope I'm using the correct forum for this question.
>
> I originally found this problem with the Infinispan's
> InitialStateTransfer  but the problem is easily reproduced with UUPerf.
>
> When I run UUPerf with the default 4.5M data size and 1 msg I see the
> following error mixed randomly in the log:
>
> 14:16:27.607 ERROR (Invoker-1) [org.jgroups.protocols.TP.down(TP.java:1344)] JGRP000029: jfrost-multi-8014:failed sending message to jfrost-multi2-58507 (30079 bytes):
> java.lang.Exception: dest=/192.168.10.193:58055 (30082 bytes),
> headers: FRAG2: [id=1, frag_id=64, num_frags=151], UNICAST3: DATA, seqno=67, conn_id=1, UDP: [channel_name=uuperf]
>
>
> I tracked in with the debugger and found the root cause is an
> IOException with the message "No buffers available" returned by
> MulticastSocket.send(). I've seen other postings that the "No buffers
> available" can be a bit of a red herring when it comes to the exact
> problem(full ARP table under Linux for example).  I'm worried I'm
> chasing the wrong problem.
>
> I can post trace logs if they will help.   From the logs it also appears
> that the problem gets progressively worse as buffer problems will do.
> So it could be that once the error state is entered buffers don't get
> returned in a timely manner.
>
> At first I thought the problem was specifically the frag_size as the
> error seemed to occur at the frag_size boundary.  For frag_size=20K I
> would see errors with size 20082.  But I was told by someone else that
> the size was normal(data+overhead).
>
> I've tried adjusting UDP buffers and max_bundle_size, FRAG2 frag_size
> and UFC/MFC max_credits and threshold.  But I can't seem to find a magic
> combination of values that will alleviate the problem.
>
> I note below that my test machines are VMWare based but I've been able
> to try the test on hardware based FreeBSD and the results are the same.
>
> My sysadmin did some checking and it doesn't seem like the system buffer
> pool gets anywhere near depletion.  The systems aren't really do
> anything else besides my test.
>
> Any insights or suggestions are welcome.
>
> /Jack
>
> ------------------------------------------------------------------------
>
>
> I'm running:
>
>          2 nodes running FreeBSD hosted under VMWare.
>          Nodes are on the same LAN segment, no routers involved.
>          jgroups 3.4.1 (The problem show up with 3.3.1 as well but the
> error message is slightly different,  more explicit actually)
>          openjdk 1.7.0_25
>
> This is one of the last configurations I tried. With this config there
> seems to be a sensitivity to logging. With UDP set to DEBUG it only
> takes one 4.5M RPC to show the error.  With TRACE it may not show up
> without raising the RPC count.
>
> <config xmlns="urn:org:jgroups"
>          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>          xsi:schemaLocation="urn:org:jgroupshttp://www.jgroups.org/schema/JGroups-3.4.xsd">
>     <UDP
>           mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}"
>           mcast_port="${jgroups.udp.mcast_port:46655}"
>           tos="8"
>           ucast_recv_buf_size="32m"
>           ucast_send_buf_size="32m"
>           mcast_recv_buf_size="32m"
>           mcast_send_buf_size="32m"
>           loopback="true"
>           max_bundle_size="21k"
>           ip_ttl="${jgroups.udp.ip_ttl:2}"
>           enable_diagnostics="false"
>           bundler_type="old"
>
>           thread_naming_pattern="pl"
>
>           thread_pool.enabled="true"
>           thread_pool.min_threads="2"
>           thread_pool.max_threads="30"
>           thread_pool.keep_alive_time="60000"
>           thread_pool.queue_enabled="true"
>           thread_pool.queue_max_size="100"
>           thread_pool.rejection_policy="Discard"
>
>           oob_thread_pool.enabled="true"
>           oob_thread_pool.min_threads="2"
>           oob_thread_pool.max_threads="30"
>           oob_thread_pool.keep_alive_time="60000"
>           oob_thread_pool.queue_enabled="false"
>           oob_thread_pool.queue_max_size="100"
>           oob_thread_pool.rejection_policy="Discard"
>
>           internal_thread_pool.enabled="true"
>           internal_thread_pool.min_threads="1"
>           internal_thread_pool.max_threads="10"
>           internal_thread_pool.keep_alive_time="60000"
>           internal_thread_pool.queue_enabled="true"
>           internal_thread_pool.queue_max_size="100"
>           internal_thread_pool.rejection_policy="Discard"
>           />
>
>     <PING timeout="3000" num_initial_members="3"/>
>     <MERGE2 max_interval="30000" min_interval="10000"/>
>
>     <FD_SOCK/>
>     <FD_ALL timeout="15000" interval="3000"/>
>     <VERIFY_SUSPECT timeout="1500"/>
>
>     <pbcast.NAKACK2
>                      xmit_interval="1000"
>                      xmit_table_num_rows="100"
>                      xmit_table_msgs_per_row="10000"
>                      xmit_table_max_compaction_time="10000"
>                      max_msg_batch_size="100"/>
>     <UNICAST3
>                xmit_interval="500"
>                xmit_table_num_rows="20"
>                xmit_table_msgs_per_row="10000"
>                xmit_table_max_compaction_time="10000"
>                max_msg_batch_size="100"
>                conn_expiry_timeout="0"/>
>
>     <pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1m"/>
>     <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
>     <tom.TOA/> <!-- the TOA is only needed for total order transactions-->
>
>     <UFC max_credits="2m" min_threshold="0.40" />
>     <MFC max_credits="2m" min_threshold="0.40" />
>     <FRAG2 frag_size="20k"  />
>     <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />
> </config>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)