JGroups / Discussion / Developers: RetransmitTable has huge messages (20 of them

sageroger - 2016-12-13

We are running into a problem with Jgroups. When we run for a couple of days, the application runs out of java heap memory. When we look at the heap dumps, we can see large objects of type "array of org.jgroups.Message" holding >40 MB each. All these objects are being held by org.jgroups.util.RetransmitTable. What could be causing this problem? We are on 3.6.11.Final. Does anyone know how to have this problem fixed?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2016-12-13
  
  RetransmitTable is not used in 3.6.11; I suspect you're using an
  outdated configuration (NAKACK instead of NAKACK2).
  
  If you want someone to look into this, you need to post
  - Configuration
  - Stack trace / thread dump
  - How to reproduce if possible
  
  On 13/12/16 05:55, sageroger wrote:
  
  We are running into a problem with Jgroups. When we run for a couple of
  days, the application runs out of java heap memory. When we look at the
  heap dumps, we can see large objects of type "array of
  org.jgroups.Message" holding >40 MB each. All these objects are being
  held by org.jgroups.util.RetransmitTable. What could be causing this
  problem? We are on 3.6.11.Final. Does anyone know how to have this
  problem fixed?
  
  RetransmitTable has huge messages (20 of them - each of ~40 MB)!
  https://sourceforge.net/p/javagroups/discussion/18796/thread/18d7d019/?limit=25#c25c
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18796/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban, JGroups lead (http://www.jgroups.org)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sageroger - 2016-12-13

Bela,
Thanks for the reponse.

Here is the configuration:

<config clusterName="MyTest">
<UDP mcast_addr="228.8.8.8" mcast_port="8888" ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="false"/>
<PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false"/>
<MERGE2 min_interval="10000" max_interval="20000"/>
<FD shun="true" up_thread="true" down_thread="true"/>
<VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" up_thread="false" down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" down_thread="false"/>
<FRAG frag_size="8192" down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
</config>

We do not have a stack trace. But in heap analysis we can see the data structures holding memory. This appears to occur in heave traffic situations.

Another question we have is: we do not need to have FIFO semantics in our case. Can we disable FIFO (which will probably eliminate the need for retransmits)? If so how can we do that?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bela Ban - 2016-12-13

The config you posted is most definitely NOT a 3.6.x config, as attributes like shun or up_thread were eliminated decades ago!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sageroger - 2016-12-14

Hmm.. the reason we are using an old config file is we just upgraded from an ancient version to the latest version. But thought the new version works fine with the old config. Where can we find a good termplate for the new config file? Do you have any comments on my previous question about disabling FIFO? Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2016-12-15
  
  On 14/12/16 21:37, sageroger wrote:
  
  Hmm.. the reason we are using an old config file is we just upgraded
  from an ancient version to the latest version. But thought the new
  version works fine with the old config.
  
  No, it doesn't. With the config you posted, a 3.6.11 system will not
  even start! So either you don't use the config you posted, or you
  don't use 3.6.11...
  
  Where can we find a good template for the new config file?
  
  Look in the ./conf dir of the src code for examples:
  https://github.com/belaban/JGroups/tree/3.6/conf
  
  Do you have any comments on my previous question about disabling FIFO? Thanks!
  
  What do you mean by disabling FIFO? No retransmission of lost messages?
  No ordering guarantees?
  
  RetransmitTable has huge messages (20 of them - each of ~40 MB)!
  https://sourceforge.net/p/javagroups/discussion/18796/thread/18d7d019/?limit=25#b1f6
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18796/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban, JGroups lead (http://www.jgroups.org)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sageroger - 2016-12-15

Thanks for the conf link. By not needing FIFO I meant we need retransmission of lost messages but no ordering guarantees.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2016-12-16
  
  You could send your messages as OOB messages then, which are unordered
  but lossless.
  I suggest to find out the root cause first though, as this might also
  affect OOB messages.
  
  On 15/12/16 20:10, sageroger wrote:
  
  Thanks for the conf link. By not needing FIFO I meant we need
  retransmission of lost messages but no ordering guarantees.
  
  RetransmitTable has huge messages (20 of them - each of ~40 MB)!
  https://sourceforge.net/p/javagroups/discussion/18796/thread/18d7d019/?limit=25#18a1
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18796/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban, JGroups lead (http://www.jgroups.org)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

RetransmitTable has huge messages (20 of them - each of ~40 MB)!

Forums

Help

RetransmitTable has huge messages (20 of them - each of ~40 MB)!

RetransmitTable has huge messages (20 of them - each of ~40 MB)!

Forums

Help

RetransmitTable has huge messages (20 of them - each of ~40 MB)! document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

RetransmitTable has huge messages (20 of them - each of ~40 MB)!