Re: [jgroups-dev] Buffer starvation or something else?
Brought to you by:
belaban
From: Bela B. <be...@ya...> - 2014-02-03 14:21:30
|
Hi Jack, I've never seen a "no buffers available" error message. If this is a temporary condition, as JGroups retransmits messages until they've been delivered everywhere (or by a dest member), this should not cause problems, unless the error condition persists. I ran 2 UUPerf instances on the same (Linux) box and I wasn't able to reproduce this. How many instances did you run ? Were they on the same box ? I suggest try out the following things: - Run under Linux, not FreeBSD, to see if this changes things. If you can reproduce it under Linux, post the exact steps / environment (e.g. how many nodes) to reproduce it - Reduce your buffer sizes in the UDP prot On 30/01/14 23:28, Jack wrote: > > I hope I'm using the correct forum for this question. > > I originally found this problem with the Infinispan's > InitialStateTransfer but the problem is easily reproduced with UUPerf. > > When I run UUPerf with the default 4.5M data size and 1 msg I see the > following error mixed randomly in the log: > > 14:16:27.607 ERROR (Invoker-1) [org.jgroups.protocols.TP.down(TP.java:1344)] JGRP000029: jfrost-multi-8014:failed sending message to jfrost-multi2-58507 (30079 bytes): > java.lang.Exception: dest=/192.168.10.193:58055 (30082 bytes), > headers: FRAG2: [id=1, frag_id=64, num_frags=151], UNICAST3: DATA, seqno=67, conn_id=1, UDP: [channel_name=uuperf] > > > I tracked in with the debugger and found the root cause is an > IOException with the message "No buffers available" returned by > MulticastSocket.send(). I've seen other postings that the "No buffers > available" can be a bit of a red herring when it comes to the exact > problem(full ARP table under Linux for example). I'm worried I'm > chasing the wrong problem. > > I can post trace logs if they will help. From the logs it also appears > that the problem gets progressively worse as buffer problems will do. > So it could be that once the error state is entered buffers don't get > returned in a timely manner. > > At first I thought the problem was specifically the frag_size as the > error seemed to occur at the frag_size boundary. For frag_size=20K I > would see errors with size 20082. But I was told by someone else that > the size was normal(data+overhead). > > I've tried adjusting UDP buffers and max_bundle_size, FRAG2 frag_size > and UFC/MFC max_credits and threshold. But I can't seem to find a magic > combination of values that will alleviate the problem. > > I note below that my test machines are VMWare based but I've been able > to try the test on hardware based FreeBSD and the results are the same. > > My sysadmin did some checking and it doesn't seem like the system buffer > pool gets anywhere near depletion. The systems aren't really do > anything else besides my test. > > Any insights or suggestions are welcome. > > /Jack > > ------------------------------------------------------------------------ > > > I'm running: > > 2 nodes running FreeBSD hosted under VMWare. > Nodes are on the same LAN segment, no routers involved. > jgroups 3.4.1 (The problem show up with 3.3.1 as well but the > error message is slightly different, more explicit actually) > openjdk 1.7.0_25 > > This is one of the last configurations I tried. With this config there > seems to be a sensitivity to logging. With UDP set to DEBUG it only > takes one 4.5M RPC to show the error. With TRACE it may not show up > without raising the RPC count. > > <config xmlns="urn:org:jgroups" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="urn:org:jgroupshttp://www.jgroups.org/schema/JGroups-3.4.xsd"> > <UDP > mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}" > mcast_port="${jgroups.udp.mcast_port:46655}" > tos="8" > ucast_recv_buf_size="32m" > ucast_send_buf_size="32m" > mcast_recv_buf_size="32m" > mcast_send_buf_size="32m" > loopback="true" > max_bundle_size="21k" > ip_ttl="${jgroups.udp.ip_ttl:2}" > enable_diagnostics="false" > bundler_type="old" > > thread_naming_pattern="pl" > > thread_pool.enabled="true" > thread_pool.min_threads="2" > thread_pool.max_threads="30" > thread_pool.keep_alive_time="60000" > thread_pool.queue_enabled="true" > thread_pool.queue_max_size="100" > thread_pool.rejection_policy="Discard" > > oob_thread_pool.enabled="true" > oob_thread_pool.min_threads="2" > oob_thread_pool.max_threads="30" > oob_thread_pool.keep_alive_time="60000" > oob_thread_pool.queue_enabled="false" > oob_thread_pool.queue_max_size="100" > oob_thread_pool.rejection_policy="Discard" > > internal_thread_pool.enabled="true" > internal_thread_pool.min_threads="1" > internal_thread_pool.max_threads="10" > internal_thread_pool.keep_alive_time="60000" > internal_thread_pool.queue_enabled="true" > internal_thread_pool.queue_max_size="100" > internal_thread_pool.rejection_policy="Discard" > /> > > <PING timeout="3000" num_initial_members="3"/> > <MERGE2 max_interval="30000" min_interval="10000"/> > > <FD_SOCK/> > <FD_ALL timeout="15000" interval="3000"/> > <VERIFY_SUSPECT timeout="1500"/> > > <pbcast.NAKACK2 > xmit_interval="1000" > xmit_table_num_rows="100" > xmit_table_msgs_per_row="10000" > xmit_table_max_compaction_time="10000" > max_msg_batch_size="100"/> > <UNICAST3 > xmit_interval="500" > xmit_table_num_rows="20" > xmit_table_msgs_per_row="10000" > xmit_table_max_compaction_time="10000" > max_msg_batch_size="100" > conn_expiry_timeout="0"/> > > <pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1m"/> > <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/> > <tom.TOA/> <!-- the TOA is only needed for total order transactions--> > > <UFC max_credits="2m" min_threshold="0.40" /> > <MFC max_credits="2m" min_threshold="0.40" /> > <FRAG2 frag_size="20k" /> > <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" /> > </config> -- Bela Ban, JGroups lead (http://www.jgroups.org) |