Menu

High memory usage?

Help
2009-02-21
2013-04-11
  • Stephen Corona

    Stephen Corona - 2009-02-21

    I have a cluster of scribe servers forwarding to a central scribe server which writes the messages to disk.

    The scribe instances in the cluster are using about ~10MB of ram per server, which is great.

    However, the scribe instance on the central server is using about 500MB of memory. It's only been running for about a day and has logged roughly 4 gigs worth of data. Should scribe be using this much memory or is it leaking somewhere? It just seems awfully high compared to the instances in the cluster.

    For what it's worth, I'm using Scribe 2.0, Boost 1.37 and the Thrift trunk from 2/16. The machine is x86_64. All machines are using the same exact build of scribe.

    My config:

    port=1463
    max_msg_per_second=200000
    check_interval=3
    new_thread_per_category=true

    <store>
    category=data
    type=file
    file_path=/logs/scribe/data
    rotate_period=daily
    rotate_hour=0
    rotate_minute=0
    fs_type=std
    add_newlines=1
    create_symlink=true
    </store>

     
    • Anthony Giardullo

      How many machines are in the cluster that forwards messages to the central server?  And can you also show me the config you are using in the cluster?

      Thanks,
      Anthony

       
    • Stephen Corona

      Stephen Corona - 2009-02-23

      Central Server - ps aux | grep scribe
      someuser     28557  2.9 19.9 1078836 412048 ?      Sl   Feb22  24:53 /usr/local/bin/scribed /vm/scribed/scribed.conf

      I bounced scribed yesterday and it grew back to about 410MB within the first 20 minutes of running.

      Cluster Server - ps aux | grep scribe
      someuser     20958  0.0  0.0  37988  5268 ?        Sl   Feb20   3:21 /vm/scribed-2.0/bin/scribed /vm/scribed-2.0/etc/scribed.conf

      There are currently 9 servers in the cluster. Each server receives 100-200 messages/second. The messages are fairly long, about 200-300 bytes each.

      -Config File-

      port=1463
      max_msg_per_second=200000
      check_interval=3

      <store>
      category=data
      type=buffer
      buffer_send_rate=1
      retry_interval=30
      retry_interval_range=10

      <primary>
      type=network
      remote_host=central_server
      remote_port=1463
      </primary>

      <secondary>
      type=file
      file_path=/tmp
      max_size=10000000
      </secondary>
      </store>

       
    • Anthony Giardullo

      Scribe should easily be able to handle 9 connections with a total of 1000-2000 messages per second.  I am guessing that most of this memory usage is coming from buffering inside Thrift.  I have a patch to free the excess memory held in Thrift’s TConnection buffers that I have been meaning to submit to Thrift.  I will work on getting this patch cleaned up and submitted.

      Also, I would recommend reducing the max_size in the buffer store’s secondary store if you are concerned with memory usage. Your buffer store is currently configured to write 10MB files to /tmp when the central server is busy.  The buffer store will then send all ~10MB to the central server at once.  The central server then needs to allocate space to hold this data.  Thrift is probably allocating ~2x10MB for each of the 9 connections, which is most of the memory usage you are seeing (and this is what the patch I have fixes).

      Also, if you use the scribe_ctrl script to monitor Scribe counters, you will probably see that you are occasionally getting messages that get ‘denied for queue size’.  This is because you are sometimes sending 10MB of messages while your queue size on the central scribe server is only 5MB.  This causes the scribe nodes in your cluster to buffer to the secondary store and try again later.  (No messages are lost, but you just won’t get as good of throughput).  You can improve this by reducing the max_size as mentioned above and/or setting max_queue_size on the central Scribe server to larger value.

      -Anthony

       
    • Stephen Corona

      Stephen Corona - 2009-02-24

      Hey Anthony,

      I pushed a new config file with max_size=500000 + use_conn_pool=yes to the cluster and bounced scribe on all of the boxes. Memory usage dropped from 400MB to 7MB on the central server and is holding steady :-)

      Additionally, CPU usage has dropped on the central server from a steady 15-20% to 2-5%

      Can you post the thrift JIRA # when you submit the patch?

      Looks good! Thanks.

       
    • Anthony Giardullo

      Great.  I'll let you know when I post the patch.  Hopefully soon.

      Also, use_conn_pool won't help you right now because you only have 1 category defined in your cluster.  So the central server will just have 1 connection for each machine in the cluster.  But if you were to add multiple categories that were configured to send to the same central server, then connection pooling would be useful.

      -Anthony

       
    • Stephen Corona

      Stephen Corona - 2009-02-24

      I'm going to be adding another high volume category in a couple of days (about 1k-3k messages/second) so I figured I would add in the connection pooling setting for it.

      Throughout the day, scribe climbed up to roughly 60MB of memory usage on the central server but it's been holding pretty steady. That's much better than 400MB :-) I can probably get it down lower if a tweak the config options a bit more but I'm happy with 60MB.

      Out of curiosity, is the preferred method of communication via forums or mailing list?

      Thanks,

      Steve Corona

       
    • Anthony Giardullo

      I like the forums so far.  It seems like they have attracted more responses/discussion than the mailing lists.  But either is fine.

      -Anthony

       
    • Anthony Giardullo

      Steve,

      Here are a couple Thrift patches you might be interested in.

      This is the patch we use at Facebook to limit the amount of memory used by Scribe:
      https://issues.apache.org/jira/browse/THRIFT-357

      Also, here is another patch which was just merged that also reduces the amount of memory used.  I have not tried this out with Scribe though.
      http://issues.apache.org/jira/browse/THRIFT-265

      Let me know how things go.

      Thanks,
      Anthony

       
    • Stephen Corona

      Stephen Corona - 2009-03-05

      Hey Anthony,

      Thanks for the update. Scribe has been behaving itself pretty well using only ~6MB of memory on the central logging server. When I get some time I will rebuild thrift and scribe to keep up with the recent changes (IIRC, I am only using Scribe 2.01 on one server)

      Oh, BTW, are you going to commit scribe enhancements in the bug tracker (that you submitted) to the trunk? It would be nice to get the extra functionality without having to patch myself :-)

      Steve Corona

       
    • Anthony Giardullo

      Yes, I am planning on committing those patches.  I am posting them as patches first to give people the opportunity to voice any concerns before I commit.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.