Re: [jgroups-users] Odd RPC behavior
Brought to you by:
belaban
|
From: Bela B. <be...@ya...> - 2014-06-10 07:01:38
|
Hi Jim, when a unicast message gets dropped, it will get retransmitted within 2 * UNICAST3.xmit_interval ms, as UNICAST3 uses positive acks. However, for multicast messages, NAKACK2 uses *negative acks*, ie. the receivers ask the senders for missing messages. If you mulicast messages A1 and A2, but A2 is dropped, nobody will know that A actually did send message #2 until A sends A3, or STABLE kicks in and does message reconciliation. The reason for using neg acks for multicasts is to prevent acks flooding; I used to have a SMACK protocol in JGroups some time ago, but for large clusters it generated too many acks. Now, to solve your problem, you could add RSVP [1] to the stack and mark some messages/RPCs as RSVP. Ideally, this would be done after a *batch of work*, as RSVP is costly, especially in large clusters. See [1] for details. Alternatively, reduce the STABLE.desired_avg_gossip, but this will cause constant traffic from all members to the coord, which I don't think is a good idea. [1] http://www.jgroups.org/manual/html/user-channel.html#RsvpSection On 06/06/14 03:35, Jim Thomas wrote: > I'm using a muxed RPC on Android with JGroups 3.4.4, presently with two > nodes. I'm doing a 30 second periodic callRemoteMethodsWithFuture(null > ...) from node 1 and occasionally the call does not go through on node > 2 until the next (of the same) call is sent. So what I see is: > > T N1 N1 > 0 rpc1 fc1 rpc1 > 30 rpc2 nothing received > 60 rpc3 fc2,fc3 rpc2 rpc3 (receive one call right after the other) > 90 rpc4 fc4 rpc4 > > The future callbacks always show success=true and suspected=false. On > the call options I set the timeout to 1000 (1 sec right?) but I don't > get any timeout behavior as far as I can tell. > > The channels are carrying frequent unreliable traffic and infrequent rpc > traffic but the rpc calls of other methods seem to be going through > reliably. > > I was getting similar behavior of missed calls on the remote node when I > was using callRemoteMethods with GET_NONE. > > This is over wifi so I can see that maybe a message could be lost but > this seems more frequent than I'd expect. But I would expect the > message to be resent long before the next RPC call. > > I do have rpc calls back and forth but I thought I had avoided deadlock. > It seems to me that if this were the case I'd see the same problem on > the local as well as the remote node and it would happen most of the > time. I'd also expect it to not happen here since this is the first > message in the chain of activity. > > Here is my config: > > <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xmlns="urn:org:jgroups" > xsi:schemaLocation="urn:org:jgroups > http://www.jgroups.org/schema/JGroups-3.3.xsd" > > > <UDP > enable_diagnostics="true" > ip_mcast="true" > ip_ttl="${jgroups.udp.ip_ttl:8}" > loopback="true" > max_bundle_size="1400" > max_bundle_timeout="5" > mcast_port="${jgroups.udp.mcast_port:45588}" > mcast_recv_buf_size="200K" > mcast_send_buf_size="200K" > oob_thread_pool.enabled="true" > oob_thread_pool.keep_alive_time="5000" > oob_thread_pool.max_threads="8" > oob_thread_pool.min_threads="1" > oob_thread_pool.queue_enabled="false" > oob_thread_pool.queue_max_size="100" > oob_thread_pool.rejection_policy="discard" > thread_naming_pattern="cl" > thread_pool.enabled="true" > thread_pool.keep_alive_time="5000" > thread_pool.max_threads="8" > thread_pool.min_threads="2" > thread_pool.queue_enabled="true" > thread_pool.queue_max_size="10000" > thread_pool.rejection_policy="discard" > timer.keep_alive_time="3000" > timer.max_threads="10" > timer.min_threads="4" > timer.queue_max_size="500" > timer_type="new3" > tos="8" > ucast_recv_buf_size="200K" > ucast_send_buf_size="200K" /> > > <PING /> > > <MERGE2 > max_interval="30000" > min_interval="10000" /> > > <FD_SOCK /> > > <FD_ALL /> > > <VERIFY_SUSPECT timeout="1500" /> > > <BARRIER /> > > <pbcast.NAKACK2 > discard_delivered_msgs="true" > max_msg_batch_size="500" > use_mcast_xmit="false" > xmit_interval="500" > xmit_table_max_compaction_time="30000" > xmit_table_msgs_per_row="2000" > xmit_table_num_rows="100" /> > > <UNICAST3 > conn_expiry_timeout="0" > max_msg_batch_size="500" > xmit_interval="500" > xmit_table_max_compaction_time="60000" > xmit_table_msgs_per_row="2000" > xmit_table_num_rows="100" /> > > <pbcast.STABLE > desired_avg_gossip="50000" > max_bytes="4M" > stability_delay="1000" /> > > <pbcast.GMS > join_timeout="3000" > print_local_addr="true" > view_bundling="true" /> > > <FRAG frag_size="1000" /> > > <pbcast.STATE_TRANSFER /> > > <CENTRAL_LOCK num_backups="2" /> > > </config> > > Any ideas? > > Thanks, > > JT > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/NeoTech > > > > _______________________________________________ > javagroups-users mailing list > jav...@li... > https://lists.sourceforge.net/lists/listinfo/javagroups-users > -- Bela Ban, JGroups lead (http://www.jgroups.org) |