Menu

RetransmitTable has huge messages (20 of them - each of ~40 MB)!

Developers
sageroger
2016-12-13
2016-12-16
  • sageroger

    sageroger - 2016-12-13

    We are running into a problem with Jgroups. When we run for a couple of days, the application runs out of java heap memory. When we look at the heap dumps, we can see large objects of type "array of org.jgroups.Message" holding >40 MB each. All these objects are being held by org.jgroups.util.RetransmitTable. What could be causing this problem? We are on 3.6.11.Final. Does anyone know how to have this problem fixed?

     
    • Bela Ban

      Bela Ban - 2016-12-13

      RetransmitTable is not used in 3.6.11; I suspect you're using an
      outdated configuration (NAKACK instead of NAKACK2).

      If you want someone to look into this, you need to post
      - Configuration
      - Stack trace / thread dump
      - How to reproduce if possible

      On 13/12/16 05:55, sageroger wrote:

      We are running into a problem with Jgroups. When we run for a couple of
      days, the application runs out of java heap memory. When we look at the
      heap dumps, we can see large objects of type "array of
      org.jgroups.Message" holding >40 MB each. All these objects are being
      held by org.jgroups.util.RetransmitTable. What could be causing this
      problem? We are on 3.6.11.Final. Does anyone know how to have this
      problem fixed?


      RetransmitTable has huge messages (20 of them - each of ~40 MB)!
      https://sourceforge.net/p/javagroups/discussion/18796/thread/18d7d019/?limit=25#c25c


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18796/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban, JGroups lead (http://www.jgroups.org)

       
  • sageroger

    sageroger - 2016-12-13

    Bela,
    Thanks for the reponse.

    Here is the configuration:

    <config clusterName="MyTest">
    <UDP mcast_addr="228.8.8.8" mcast_port="8888" ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="false"/>
    <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false"/>
    <MERGE2 min_interval="10000" max_interval="20000"/>
    <FD shun="true" up_thread="true" down_thread="true"/>
    <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false"/>
    <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" up_thread="false" down_thread="false"/>
    <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false"/>
    <UNICAST timeout="600,1200,2400" down_thread="false"/>
    <FRAG frag_size="8192" down_thread="false" up_thread="false"/>
    <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/>
    <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
    </config>

    We do not have a stack trace. But in heap analysis we can see the data structures holding memory. This appears to occur in heave traffic situations.

    Another question we have is: we do not need to have FIFO semantics in our case. Can we disable FIFO (which will probably eliminate the need for retransmits)? If so how can we do that?

     
  • Bela Ban

    Bela Ban - 2016-12-13

    The config you posted is most definitely NOT a 3.6.x config, as attributes like shun or up_thread were eliminated decades ago!

     
  • sageroger

    sageroger - 2016-12-14

    Hmm.. the reason we are using an old config file is we just upgraded from an ancient version to the latest version. But thought the new version works fine with the old config. Where can we find a good termplate for the new config file? Do you have any comments on my previous question about disabling FIFO? Thanks!

     
    • Bela Ban

      Bela Ban - 2016-12-15

      On 14/12/16 21:37, sageroger wrote:

      Hmm.. the reason we are using an old config file is we just upgraded
      from an ancient version to the latest version. But thought the new
      version works fine with the old config.

      No, it doesn't. With the config you posted, a 3.6.11 system will not
      even start! So either you don't use the config you posted, or you
      don't use 3.6.11...

      Where can we find a good template for the new config file?

      Look in the ./conf dir of the src code for examples:
      https://github.com/belaban/JGroups/tree/3.6/conf

      Do you have any comments on my previous question about disabling FIFO? Thanks!

      What do you mean by disabling FIFO? No retransmission of lost messages?
      No ordering guarantees?


      RetransmitTable has huge messages (20 of them - each of ~40 MB)!
      https://sourceforge.net/p/javagroups/discussion/18796/thread/18d7d019/?limit=25#b1f6


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18796/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban, JGroups lead (http://www.jgroups.org)

       
  • sageroger

    sageroger - 2016-12-15

    Thanks for the conf link. By not needing FIFO I meant we need retransmission of lost messages but no ordering guarantees.

     
    • Bela Ban

      Bela Ban - 2016-12-16

      You could send your messages as OOB messages then, which are unordered
      but lossless.
      I suggest to find out the root cause first though, as this might also
      affect OOB messages.

      On 15/12/16 20:10, sageroger wrote:

      Thanks for the conf link. By not needing FIFO I meant we need
      retransmission of lost messages but no ordering guarantees.


      RetransmitTable has huge messages (20 of them - each of ~40 MB)!
      https://sourceforge.net/p/javagroups/discussion/18796/thread/18d7d019/?limit=25#18a1


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18796/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban, JGroups lead (http://www.jgroups.org)

       

Log in to post a comment.