|
From: Kenneth W. <hap...@gm...> - 2010-06-03 21:45:47
|
Jason, I started spreading the hosts out using the wrapper and it's been running fine for a week now. Thanks! I've got another identical server that I want to balance between anyway for fail over. It's currently doing just nventory, i'd like both servers to handle both services.... you mentioned master-slave, are there any known caveats to master-master mysql replication and pointing the etch-server ruby app to mysql locally on each? On Thu, May 27, 2010 at 11:14 PM, Jason Heiss <jh...@ap...> wrote: > Do you randomize the timing of etch running on your clients? I.e. when > cron kicks off etch every 10 minutes do all 200 clients try to contact the > etch server simultaneously? You mentioned that you get a load spike when > the cron kicks off, which makes it sound like you aren't randomizing. > > We randomize our clients using the etch_cron_wrapper script that is > included with the etch distribution. It seeds the random number generator > with the client's hostname, so that each client runs at a consistent time, > but the overall client load is randomized over time. We only run etch once > an hour so you'd have to adjust the calculation in the script. > > In our larger datacenter we have over 1200 clients hitting two etch > servers. Each runs 20 unicorn processes. Our hardware is older than yours > (total of 6 cores between the two servers). Load average sits around 1.5 on > the two servers, average CPU utilization at about 35%. So your 200 clients > running every 10 minutes should be about equivalent to our 1200 running > hourly. Which means your one box should still be able to handle things, > although it would be fairly busy. > > (Well, that assumes your configuration repository is roughly the same size > as ours. We have etch managing 191 files based on running `find . -name > config.xml | wc -l` in our repo.) > > So if you aren't currently randomizing your client timing I'd encourage you > to try that. Even just spreading things out over a few minutes might help. > If that's not possible or you're already doing it then it might be time to > get another server. In our case we run mysqld on the beefier server and > point Rails on both servers to that mysqld instance. We also have mysqld > running on the other server in slave mode getting data updates via mysql > replication in case we ever need it. > > Jason > > PS: As an alternative to modifying etch_cron_wrapper you can get a more > generic version of the script that allow you to specify the max sleep > interval on the command line here: > https://sourceforge.net/apps/trac/tpkg/browser/trunk/pkgs/randomizer/randomizer > > On May 27, 2010, at 11:31 AM, Kenneth Williams wrote: > > I switched from 90 unicorn processes to 60 yesterday and this error went > away, but the latency between a config change and that change getting out to > all hosts went up to a max of around 5 hours. Each host hits etch server > every 10 minutes, at that 10 minute mark 5m load goes up to around 10 -- > this is on a dual xeon quad core. > > Do you have any recoemndations on how many processes or some additional > details on how one might go about setting this up for a robust production > environment? Currently I have Apache proxy as the entry point to just one > server running etch server with three unicorn processes, once with 8 workers > serving as my "gui", the other two with 30 workers each serving connections > from the client machines. I switched from sqlite to mysql, which does not > seem to be the bottle neck anymore, and have the etch config files stored on > a four disk sas 15k raid 10. Do I need another server, or two? I only have > ~200 client machines connecting to it. > > Thanks!! > > > On Thu, May 27, 2010 at 9:05 AM, Jason Heiss <jh...@ap...> wrote: > >> I haven't seen this before, but it looks like you're probably hitting >> Ruby's HTTP client default "read_timeout" of 60 seconds. I.e. reading the >> response from the etch server took more than 60 seconds so the client gave >> up. How's the load on your etch server(s)? How many unicorn (or whatever >> Ruby app server you use) processes are you running? What XML parser are you >> using on the server, REXML or LibXML? >> >> You could bump up read_timeout to a larger value, but that may just mask >> the problem. If you want to try adjusting read_timeout, around line 141 in >> etchclient.rb there's a line that looks like: >> >> http = Net::HTTP.new(@filesuri.host, @filesuri.port) >> >> Right below that you can add: >> >> http.read_timeout = 120 >> >> Jason >> >> On May 26, 2010, at 3:15 PM, Kenneth Williams wrote: >> >> I recently started seeing servers timing out when connecting to etch. The >> error I am seeing in the logs on the client is new to me: >> >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; >> discarding old can_connect? >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; >> discarding old metadata >> execution expired >> /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by >> peer (Errno::ECONNRESET) >> from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' >> from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout' >> from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout' >> from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' >> from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' >> from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' >> from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' >> from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new' >> from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' >> from >> /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/lib/etchclient.rb:438:in >> `process_until_done' >> from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/bin/etch:99 >> from /usr/bin/etch:19:in `load' >> from /usr/bin/etch:19* >> >> * >> Has anyone seen this error before, or know what may be causing it, or some >> things I can look for? Thanks! >> >> ** >> -- >> Kenneth Williams >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> etch-users mailing list >> etc...@li... >> https://lists.sourceforge.net/lists/listinfo/etch-users >> >> >> > > > -- > Kenneth Williams > > > -- Kenneth Williams |