Scribe / Discussion / Help: High memory usage?

Stephen Corona - 2009-02-21

I have a cluster of scribe servers forwarding to a central scribe server which writes the messages to disk.

The scribe instances in the cluster are using about ~10MB of ram per server, which is great.

However, the scribe instance on the central server is using about 500MB of memory. It's only been running for about a day and has logged roughly 4 gigs worth of data. Should scribe be using this much memory or is it leaking somewhere? It just seems awfully high compared to the instances in the cluster.

For what it's worth, I'm using Scribe 2.0, Boost 1.37 and the Thrift trunk from 2/16. The machine is x86_64. All machines are using the same exact build of scribe.

My config:

port=1463
max_msg_per_second=200000
check_interval=3
new_thread_per_category=true

<store>
category=data
type=file
file_path=/logs/scribe/data
rotate_period=daily
rotate_hour=0
rotate_minute=0
fs_type=std
add_newlines=1
create_symlink=true
</store>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anthony Giardullo - 2009-02-23
  
  How many machines are in the cluster that forwards messages to the central server? And can you also show me the config you are using in the cluster?
  
  Thanks,
  Anthony
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Stephen Corona - 2009-02-23
  
  Central Server - ps aux | grep scribe
  someuser 28557 2.9 19.9 1078836 412048 ? Sl Feb22 24:53 /usr/local/bin/scribed /vm/scribed/scribed.conf
  
  I bounced scribed yesterday and it grew back to about 410MB within the first 20 minutes of running.
  
  Cluster Server - ps aux | grep scribe
  someuser 20958 0.0 0.0 37988 5268 ? Sl Feb20 3:21 /vm/scribed-2.0/bin/scribed /vm/scribed-2.0/etc/scribed.conf
  
  There are currently 9 servers in the cluster. Each server receives 100-200 messages/second. The messages are fairly long, about 200-300 bytes each.
  
  -Config File-
  
  port=1463
  max_msg_per_second=200000
  check_interval=3
  
  <store>
  category=data
  type=buffer
  buffer_send_rate=1
  retry_interval=30
  retry_interval_range=10
  
  <primary>
  type=network
  remote_host=central_server
  remote_port=1463
  </primary>
  
  <secondary>
  type=file
  file_path=/tmp
  max_size=10000000
  </secondary>
  </store>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anthony Giardullo - 2009-02-23
  
  Scribe should easily be able to handle 9 connections with a total of 1000-2000 messages per second. I am guessing that most of this memory usage is coming from buffering inside Thrift. I have a patch to free the excess memory held in Thrift’s TConnection buffers that I have been meaning to submit to Thrift. I will work on getting this patch cleaned up and submitted.
  
  Also, I would recommend reducing the max_size in the buffer store’s secondary store if you are concerned with memory usage. Your buffer store is currently configured to write 10MB files to /tmp when the central server is busy. The buffer store will then send all ~10MB to the central server at once. The central server then needs to allocate space to hold this data. Thrift is probably allocating ~2x10MB for each of the 9 connections, which is most of the memory usage you are seeing (and this is what the patch I have fixes).
  
  Also, if you use the scribe_ctrl script to monitor Scribe counters, you will probably see that you are occasionally getting messages that get ‘denied for queue size’. This is because you are sometimes sending 10MB of messages while your queue size on the central scribe server is only 5MB. This causes the scribe nodes in your cluster to buffer to the secondary store and try again later. (No messages are lost, but you just won’t get as good of throughput). You can improve this by reducing the max_size as mentioned above and/or setting max_queue_size on the central Scribe server to larger value.
  
  -Anthony
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Stephen Corona - 2009-02-24
  
  Hey Anthony,
  
  I pushed a new config file with max_size=500000 + use_conn_pool=yes to the cluster and bounced scribe on all of the boxes. Memory usage dropped from 400MB to 7MB on the central server and is holding steady :-)
  
  Additionally, CPU usage has dropped on the central server from a steady 15-20% to 2-5%
  
  Can you post the thrift JIRA # when you submit the patch?
  
  Looks good! Thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anthony Giardullo - 2009-02-24
  
  Great. I'll let you know when I post the patch. Hopefully soon.
  
  Also, use_conn_pool won't help you right now because you only have 1 category defined in your cluster. So the central server will just have 1 connection for each machine in the cluster. But if you were to add multiple categories that were configured to send to the same central server, then connection pooling would be useful.
  
  -Anthony
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Stephen Corona - 2009-02-24
  
  I'm going to be adding another high volume category in a couple of days (about 1k-3k messages/second) so I figured I would add in the connection pooling setting for it.
  
  Throughout the day, scribe climbed up to roughly 60MB of memory usage on the central server but it's been holding pretty steady. That's much better than 400MB :-) I can probably get it down lower if a tweak the config options a bit more but I'm happy with 60MB.
  
  Out of curiosity, is the preferred method of communication via forums or mailing list?
  
  Thanks,
  
  Steve Corona
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anthony Giardullo - 2009-02-25
  
  I like the forums so far. It seems like they have attracted more responses/discussion than the mailing lists. But either is fine.
  
  -Anthony
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anthony Giardullo - 2009-03-04
  
  Steve,
  
  Here are a couple Thrift patches you might be interested in.
  
  This is the patch we use at Facebook to limit the amount of memory used by Scribe:
  https://issues.apache.org/jira/browse/THRIFT-357
  
  Also, here is another patch which was just merged that also reduces the amount of memory used. I have not tried this out with Scribe though.
  http://issues.apache.org/jira/browse/THRIFT-265
  
  Let me know how things go.
  
  Thanks,
  Anthony
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Stephen Corona - 2009-03-05
  
  Hey Anthony,
  
  Thanks for the update. Scribe has been behaving itself pretty well using only ~6MB of memory on the central logging server. When I get some time I will rebuild thrift and scribe to keep up with the recent changes (IIRC, I am only using Scribe 2.01 on one server)
  
  Oh, BTW, are you going to commit scribe enhancements in the bug tracker (that you submitted) to the trunk? It would be nice to get the extra functionality without having to patch myself :-)
  
  Steve Corona
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anthony Giardullo - 2009-03-06
  
  Yes, I am planning on committing those patches. I am posting them as patches first to give people the opportunity to voice any concerns before I commit.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

High memory usage?

Forums

Help

High memory usage? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

High memory usage?