You can subscribe to this list here.
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
(1) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(9) |
Nov
(10) |
Dec
(5) |
2010 |
Jan
(10) |
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
(2) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(2) |
Nov
|
Dec
(1) |
2011 |
Jan
(1) |
Feb
|
Mar
|
Apr
(1) |
May
(29) |
Jun
(9) |
Jul
(4) |
Aug
(8) |
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
(13) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Darren D. <Dar...@eh...> - 2010-09-13 22:31:32
|
Supposed my node belongs to all three groups (group1, group2, and group3). What would happens if I have the following attribute filtering in my config.xml? Will myscript1.script be picked? or myscript3.script? or all 3 will get picked? If all 3 get picked, does the resulting content be the combined result from the 3 script? or will it just be the result of the last script that is run, which is myscript3 in this case. <config> <file> <warning_file/> <!-- By default files will be owned by root, but you can change that. If this user or group doesn't exist UID 0 and GID 0 will be used as a faillback. <owner>demouser</owner> <group>demogroup</group> --> <perms>400</perms> <source> <script group="group1">myscript1.script</script> <script group="group2">myscript2.script</script> <script group="group3">myscript3.script</script> </source> </file> </config> |
From: Jason H. <jh...@ap...> - 2010-08-26 23:11:59
|
Sorry, wrong mailing list. Whoops. Jason On Aug 26, 2010, at 11:45 AM, Jason Heiss wrote: > All of the tpkg wiki pages on SourceForge have been updated to reflect the move to /opt/tpkg. Let me know if you find anything else out of date in that regards. > > Thanks, > > Jason > |
From: Jason H. <jh...@ap...> - 2010-08-26 18:45:40
|
All of the tpkg wiki pages on SourceForge have been updated to reflect the move to /opt/tpkg. Let me know if you find anything else out of date in that regards. Thanks, Jason |
From: Jason H. <jh...@ap...> - 2010-08-12 08:55:50
|
Looks like this was a bug related to the local option and relative paths that I fixed in subversion back in April. I'd been meaning to make a new release for a while but there was an unrelated unit test failure that I couldn't quite figure out. I finally spent the time to sort that out (just some race condition junk in the test suite). New release announcement coming in a sec. Jason On Aug 11, 2010, at 3:05 PM, Chris Nolan wrote: > Hey Folks, > > I am playing with etch and downloaded the latest release from sourceforge. The local mode demo is failing with the following error: > > config.xml for /tmp/etchdemo/link does not exist > > I am running > > ./etch --generate-all --interactive --local ../etchserver-demo > > from the client directory. The config.xml does in fact exist in the source directory: > > cnolan$ ls -la ../etchserver-demo/source/tmp/etchdemo/link/config.xml > -rw-r--r--@ 1 cnolan staff 145 Aug 11 14:45 ../etchserver-demo/source/tmp/etchdemo/link/config.xml > > I am running this on a Mac running 10.6.4. > |
From: Jason H. <jh...@ap...> - 2010-08-12 08:55:47
|
Bug fixes: --local option now handles relative paths properly Enhancements: Server updated to use Rails 2.3.8 (latest as of this writing), and updated to use will_paginate gem instead of mislav-will_paginate (same gem, just an updated name). A bit of UI cleanup on the dashboard. Improvement to stack trace error messages to assist users in debugging problems with their configuration. Jason |
From: Chris N. <Chr...@eh...> - 2010-08-11 22:40:01
|
Hey Folks, I am playing with etch and downloaded the latest release from sourceforge. The local mode demo is failing with the following error: config.xml for /tmp/etchdemo/link does not exist I am running ./etch --generate-all --interactive --local ../etchserver-demo from the client directory. The config.xml does in fact exist in the source directory: cnolan$ ls -la ../etchserver-demo/source/tmp/etchdemo/link/config.xml -rw-r--r--@ 1 cnolan staff 145 Aug 11 14:45 ../etchserver-demo/source/tmp/etchdemo/link/config.xml I am running this on a Mac running 10.6.4. Thanks, Chris |
From: Jason H. <jh...@ap...> - 2010-06-04 02:14:59
|
master-master works fine. We used to do it here but ran out of disk space a while back and turned it off and then never got around to turning it back on. More specifically, we have four etch servers total, two in each data center. We had master-master between a primary server in the two data centers and then master-slave between the primary and secondary within the datacenter. We ran in that configuration for probably a year. We turned off the master-master replication due to disk space issues but still run the master-slave replication within each data center, so the two data centers are islands with regards to the database. It's mostly just status data so it doesn't really matter that they aren't aware of each other. Jason On Jun 3, 2010, at 2:45 PM, Kenneth Williams wrote: > Jason, > I started spreading the hosts out using the wrapper and it's been running fine for a week now. Thanks! > > I've got another identical server that I want to balance between anyway for fail over. It's currently doing just nventory, i'd like both servers to handle both services.... you mentioned master-slave, are there any known caveats to master-master mysql replication and pointing the etch-server ruby app to mysql locally on each? > > On Thu, May 27, 2010 at 11:14 PM, Jason Heiss <jh...@ap...> wrote: > Do you randomize the timing of etch running on your clients? I.e. when cron kicks off etch every 10 minutes do all 200 clients try to contact the etch server simultaneously? You mentioned that you get a load spike when the cron kicks off, which makes it sound like you aren't randomizing. > > We randomize our clients using the etch_cron_wrapper script that is included with the etch distribution. It seeds the random number generator with the client's hostname, so that each client runs at a consistent time, but the overall client load is randomized over time. We only run etch once an hour so you'd have to adjust the calculation in the script. > > In our larger datacenter we have over 1200 clients hitting two etch servers. Each runs 20 unicorn processes. Our hardware is older than yours (total of 6 cores between the two servers). Load average sits around 1.5 on the two servers, average CPU utilization at about 35%. So your 200 clients running every 10 minutes should be about equivalent to our 1200 running hourly. Which means your one box should still be able to handle things, although it would be fairly busy. > > (Well, that assumes your configuration repository is roughly the same size as ours. We have etch managing 191 files based on running `find . -name config.xml | wc -l` in our repo.) > > So if you aren't currently randomizing your client timing I'd encourage you to try that. Even just spreading things out over a few minutes might help. If that's not possible or you're already doing it then it might be time to get another server. In our case we run mysqld on the beefier server and point Rails on both servers to that mysqld instance. We also have mysqld running on the other server in slave mode getting data updates via mysql replication in case we ever need it. > > Jason > > PS: As an alternative to modifying etch_cron_wrapper you can get a more generic version of the script that allow you to specify the max sleep interval on the command line here: https://sourceforge.net/apps/trac/tpkg/browser/trunk/pkgs/randomizer/randomizer > > On May 27, 2010, at 11:31 AM, Kenneth Williams wrote: > >> I switched from 90 unicorn processes to 60 yesterday and this error went away, but the latency between a config change and that change getting out to all hosts went up to a max of around 5 hours. Each host hits etch server every 10 minutes, at that 10 minute mark 5m load goes up to around 10 -- this is on a dual xeon quad core. >> >> Do you have any recoemndations on how many processes or some additional details on how one might go about setting this up for a robust production environment? Currently I have Apache proxy as the entry point to just one server running etch server with three unicorn processes, once with 8 workers serving as my "gui", the other two with 30 workers each serving connections from the client machines. I switched from sqlite to mysql, which does not seem to be the bottle neck anymore, and have the etch config files stored on a four disk sas 15k raid 10. Do I need another server, or two? I only have ~200 client machines connecting to it. >> >> Thanks!! >> >> >> On Thu, May 27, 2010 at 9:05 AM, Jason Heiss <jh...@ap...> wrote: >> I haven't seen this before, but it looks like you're probably hitting Ruby's HTTP client default "read_timeout" of 60 seconds. I.e. reading the response from the etch server took more than 60 seconds so the client gave up. How's the load on your etch server(s)? How many unicorn (or whatever Ruby app server you use) processes are you running? What XML parser are you using on the server, REXML or LibXML? >> >> You could bump up read_timeout to a larger value, but that may just mask the problem. If you want to try adjusting read_timeout, around line 141 in etchclient.rb there's a line that looks like: >> >> http = Net::HTTP.new(@filesuri.host, @filesuri.port) >> >> Right below that you can add: >> >> http.read_timeout = 120 >> >> Jason >> >> On May 26, 2010, at 3:15 PM, Kenneth Williams wrote: >> >>> I recently started seeing servers timing out when connecting to etch. The error I am seeing in the logs on the client is new to me: >>> >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata >>> execution expired >>> /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by peer (Errno::ECONNRESET) >>> from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' >>> from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout' >>> from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout' >>> from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' >>> from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' >>> from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' >>> from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' >>> from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new' >>> from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' >>> from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/lib/etchclient.rb:438:in `process_until_done' >>> from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/bin/etch:99 >>> from /usr/bin/etch:19:in `load' >>> from /usr/bin/etch:19 >>> >>> Has anyone seen this error before, or know what may be causing it, or some things I can look for? Thanks! >>> >>> >>> -- >>> Kenneth Williams >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> etch-users mailing list >>> etc...@li... >>> https://lists.sourceforge.net/lists/listinfo/etch-users >> >> >> >> >> -- >> Kenneth Williams > > > > > -- > Kenneth Williams |
From: Kenneth W. <hap...@gm...> - 2010-06-03 21:45:47
|
Jason, I started spreading the hosts out using the wrapper and it's been running fine for a week now. Thanks! I've got another identical server that I want to balance between anyway for fail over. It's currently doing just nventory, i'd like both servers to handle both services.... you mentioned master-slave, are there any known caveats to master-master mysql replication and pointing the etch-server ruby app to mysql locally on each? On Thu, May 27, 2010 at 11:14 PM, Jason Heiss <jh...@ap...> wrote: > Do you randomize the timing of etch running on your clients? I.e. when > cron kicks off etch every 10 minutes do all 200 clients try to contact the > etch server simultaneously? You mentioned that you get a load spike when > the cron kicks off, which makes it sound like you aren't randomizing. > > We randomize our clients using the etch_cron_wrapper script that is > included with the etch distribution. It seeds the random number generator > with the client's hostname, so that each client runs at a consistent time, > but the overall client load is randomized over time. We only run etch once > an hour so you'd have to adjust the calculation in the script. > > In our larger datacenter we have over 1200 clients hitting two etch > servers. Each runs 20 unicorn processes. Our hardware is older than yours > (total of 6 cores between the two servers). Load average sits around 1.5 on > the two servers, average CPU utilization at about 35%. So your 200 clients > running every 10 minutes should be about equivalent to our 1200 running > hourly. Which means your one box should still be able to handle things, > although it would be fairly busy. > > (Well, that assumes your configuration repository is roughly the same size > as ours. We have etch managing 191 files based on running `find . -name > config.xml | wc -l` in our repo.) > > So if you aren't currently randomizing your client timing I'd encourage you > to try that. Even just spreading things out over a few minutes might help. > If that's not possible or you're already doing it then it might be time to > get another server. In our case we run mysqld on the beefier server and > point Rails on both servers to that mysqld instance. We also have mysqld > running on the other server in slave mode getting data updates via mysql > replication in case we ever need it. > > Jason > > PS: As an alternative to modifying etch_cron_wrapper you can get a more > generic version of the script that allow you to specify the max sleep > interval on the command line here: > https://sourceforge.net/apps/trac/tpkg/browser/trunk/pkgs/randomizer/randomizer > > On May 27, 2010, at 11:31 AM, Kenneth Williams wrote: > > I switched from 90 unicorn processes to 60 yesterday and this error went > away, but the latency between a config change and that change getting out to > all hosts went up to a max of around 5 hours. Each host hits etch server > every 10 minutes, at that 10 minute mark 5m load goes up to around 10 -- > this is on a dual xeon quad core. > > Do you have any recoemndations on how many processes or some additional > details on how one might go about setting this up for a robust production > environment? Currently I have Apache proxy as the entry point to just one > server running etch server with three unicorn processes, once with 8 workers > serving as my "gui", the other two with 30 workers each serving connections > from the client machines. I switched from sqlite to mysql, which does not > seem to be the bottle neck anymore, and have the etch config files stored on > a four disk sas 15k raid 10. Do I need another server, or two? I only have > ~200 client machines connecting to it. > > Thanks!! > > > On Thu, May 27, 2010 at 9:05 AM, Jason Heiss <jh...@ap...> wrote: > >> I haven't seen this before, but it looks like you're probably hitting >> Ruby's HTTP client default "read_timeout" of 60 seconds. I.e. reading the >> response from the etch server took more than 60 seconds so the client gave >> up. How's the load on your etch server(s)? How many unicorn (or whatever >> Ruby app server you use) processes are you running? What XML parser are you >> using on the server, REXML or LibXML? >> >> You could bump up read_timeout to a larger value, but that may just mask >> the problem. If you want to try adjusting read_timeout, around line 141 in >> etchclient.rb there's a line that looks like: >> >> http = Net::HTTP.new(@filesuri.host, @filesuri.port) >> >> Right below that you can add: >> >> http.read_timeout = 120 >> >> Jason >> >> On May 26, 2010, at 3:15 PM, Kenneth Williams wrote: >> >> I recently started seeing servers timing out when connecting to etch. The >> error I am seeing in the logs on the client is new to me: >> >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; >> discarding old can_connect? >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; >> discarding old metadata >> execution expired >> /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by >> peer (Errno::ECONNRESET) >> from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' >> from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout' >> from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout' >> from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' >> from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' >> from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' >> from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' >> from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new' >> from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' >> from >> /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/lib/etchclient.rb:438:in >> `process_until_done' >> from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/bin/etch:99 >> from /usr/bin/etch:19:in `load' >> from /usr/bin/etch:19* >> >> * >> Has anyone seen this error before, or know what may be causing it, or some >> things I can look for? Thanks! >> >> ** >> -- >> Kenneth Williams >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> etch-users mailing list >> etc...@li... >> https://lists.sourceforge.net/lists/listinfo/etch-users >> >> >> > > > -- > Kenneth Williams > > > -- Kenneth Williams |
From: Jason H. <jh...@ap...> - 2010-05-28 06:14:56
|
Do you randomize the timing of etch running on your clients? I.e. when cron kicks off etch every 10 minutes do all 200 clients try to contact the etch server simultaneously? You mentioned that you get a load spike when the cron kicks off, which makes it sound like you aren't randomizing. We randomize our clients using the etch_cron_wrapper script that is included with the etch distribution. It seeds the random number generator with the client's hostname, so that each client runs at a consistent time, but the overall client load is randomized over time. We only run etch once an hour so you'd have to adjust the calculation in the script. In our larger datacenter we have over 1200 clients hitting two etch servers. Each runs 20 unicorn processes. Our hardware is older than yours (total of 6 cores between the two servers). Load average sits around 1.5 on the two servers, average CPU utilization at about 35%. So your 200 clients running every 10 minutes should be about equivalent to our 1200 running hourly. Which means your one box should still be able to handle things, although it would be fairly busy. (Well, that assumes your configuration repository is roughly the same size as ours. We have etch managing 191 files based on running `find . -name config.xml | wc -l` in our repo.) So if you aren't currently randomizing your client timing I'd encourage you to try that. Even just spreading things out over a few minutes might help. If that's not possible or you're already doing it then it might be time to get another server. In our case we run mysqld on the beefier server and point Rails on both servers to that mysqld instance. We also have mysqld running on the other server in slave mode getting data updates via mysql replication in case we ever need it. Jason PS: As an alternative to modifying etch_cron_wrapper you can get a more generic version of the script that allow you to specify the max sleep interval on the command line here: https://sourceforge.net/apps/trac/tpkg/browser/trunk/pkgs/randomizer/randomizer On May 27, 2010, at 11:31 AM, Kenneth Williams wrote: > I switched from 90 unicorn processes to 60 yesterday and this error went away, but the latency between a config change and that change getting out to all hosts went up to a max of around 5 hours. Each host hits etch server every 10 minutes, at that 10 minute mark 5m load goes up to around 10 -- this is on a dual xeon quad core. > > Do you have any recoemndations on how many processes or some additional details on how one might go about setting this up for a robust production environment? Currently I have Apache proxy as the entry point to just one server running etch server with three unicorn processes, once with 8 workers serving as my "gui", the other two with 30 workers each serving connections from the client machines. I switched from sqlite to mysql, which does not seem to be the bottle neck anymore, and have the etch config files stored on a four disk sas 15k raid 10. Do I need another server, or two? I only have ~200 client machines connecting to it. > > Thanks!! > > > On Thu, May 27, 2010 at 9:05 AM, Jason Heiss <jh...@ap...> wrote: > I haven't seen this before, but it looks like you're probably hitting Ruby's HTTP client default "read_timeout" of 60 seconds. I.e. reading the response from the etch server took more than 60 seconds so the client gave up. How's the load on your etch server(s)? How many unicorn (or whatever Ruby app server you use) processes are you running? What XML parser are you using on the server, REXML or LibXML? > > You could bump up read_timeout to a larger value, but that may just mask the problem. If you want to try adjusting read_timeout, around line 141 in etchclient.rb there's a line that looks like: > > http = Net::HTTP.new(@filesuri.host, @filesuri.port) > > Right below that you can add: > > http.read_timeout = 120 > > Jason > > On May 26, 2010, at 3:15 PM, Kenneth Williams wrote: > >> I recently started seeing servers timing out when connecting to etch. The error I am seeing in the logs on the client is new to me: >> >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata >> execution expired >> /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by peer (Errno::ECONNRESET) >> from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' >> from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout' >> from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout' >> from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' >> from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' >> from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' >> from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' >> from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new' >> from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' >> from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/lib/etchclient.rb:438:in `process_until_done' >> from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/bin/etch:99 >> from /usr/bin/etch:19:in `load' >> from /usr/bin/etch:19 >> >> Has anyone seen this error before, or know what may be causing it, or some things I can look for? Thanks! >> >> >> -- >> Kenneth Williams >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> etch-users mailing list >> etc...@li... >> https://lists.sourceforge.net/lists/listinfo/etch-users > > > > > -- > Kenneth Williams |
From: Kenneth W. <hap...@gm...> - 2010-05-27 18:31:54
|
I switched from 90 unicorn processes to 60 yesterday and this error went away, but the latency between a config change and that change getting out to all hosts went up to a max of around 5 hours. Each host hits etch server every 10 minutes, at that 10 minute mark 5m load goes up to around 10 -- this is on a dual xeon quad core. Do you have any recoemndations on how many processes or some additional details on how one might go about setting this up for a robust production environment? Currently I have Apache proxy as the entry point to just one server running etch server with three unicorn processes, once with 8 workers serving as my "gui", the other two with 30 workers each serving connections from the client machines. I switched from sqlite to mysql, which does not seem to be the bottle neck anymore, and have the etch config files stored on a four disk sas 15k raid 10. Do I need another server, or two? I only have ~200 client machines connecting to it. Thanks!! On Thu, May 27, 2010 at 9:05 AM, Jason Heiss <jh...@ap...> wrote: > I haven't seen this before, but it looks like you're probably hitting > Ruby's HTTP client default "read_timeout" of 60 seconds. I.e. reading the > response from the etch server took more than 60 seconds so the client gave > up. How's the load on your etch server(s)? How many unicorn (or whatever > Ruby app server you use) processes are you running? What XML parser are you > using on the server, REXML or LibXML? > > You could bump up read_timeout to a larger value, but that may just mask > the problem. If you want to try adjusting read_timeout, around line 141 in > etchclient.rb there's a line that looks like: > > http = Net::HTTP.new(@filesuri.host, @filesuri.port) > > Right below that you can add: > > http.read_timeout = 120 > > Jason > > On May 26, 2010, at 3:15 PM, Kenneth Williams wrote: > > I recently started seeing servers timing out when connecting to etch. The > error I am seeing in the logs on the client is new to me: > > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; > discarding old can_connect? > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; > discarding old metadata > execution expired > /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by > peer (Errno::ECONNRESET) > from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' > from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout' > from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout' > from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' > from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' > from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' > from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' > from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new' > from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' > from > /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/lib/etchclient.rb:438:in > `process_until_done' > from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/bin/etch:99 > from /usr/bin/etch:19:in `load' > from /usr/bin/etch:19* > > * > Has anyone seen this error before, or know what may be causing it, or some > things I can look for? Thanks! > > ** > -- > Kenneth Williams > > ------------------------------------------------------------------------------ > > _______________________________________________ > etch-users mailing list > etc...@li... > https://lists.sourceforge.net/lists/listinfo/etch-users > > > -- Kenneth Williams |
From: Jason H. <jh...@ap...> - 2010-05-27 16:55:54
|
I haven't seen this before, but it looks like you're probably hitting Ruby's HTTP client default "read_timeout" of 60 seconds. I.e. reading the response from the etch server took more than 60 seconds so the client gave up. How's the load on your etch server(s)? How many unicorn (or whatever Ruby app server you use) processes are you running? What XML parser are you using on the server, REXML or LibXML? You could bump up read_timeout to a larger value, but that may just mask the problem. If you want to try adjusting read_timeout, around line 141 in etchclient.rb there's a line that looks like: http = Net::HTTP.new(@filesuri.host, @filesuri.port) Right below that you can add: http.read_timeout = 120 Jason On May 26, 2010, at 3:15 PM, Kenneth Williams wrote: > I recently started seeing servers timing out when connecting to etch. The error I am seeing in the logs on the client is new to me: > > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata > execution expired > /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by peer (Errno::ECONNRESET) > from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' > from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout' > from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout' > from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' > from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' > from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' > from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' > from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new' > from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' > from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/lib/etchclient.rb:438:in `process_until_done' > from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/bin/etch:99 > from /usr/bin/etch:19:in `load' > from /usr/bin/etch:19 > > Has anyone seen this error before, or know what may be causing it, or some things I can look for? Thanks! > > > -- > Kenneth Williams > ------------------------------------------------------------------------------ > > _______________________________________________ > etch-users mailing list > etc...@li... > https://lists.sourceforge.net/lists/listinfo/etch-users |
From: Kenneth W. <hap...@gm...> - 2010-05-26 22:15:08
|
I recently started seeing servers timing out when connecting to etch. The error I am seeing in the logs on the client is new to me: /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata execution expired /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by peer (Errno::ECONNRESET) from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' from /usr/lib/ruby/1.8/timeout.rb:62:in `timeout' from /usr/lib/ruby/1.8/timeout.rb:93:in `timeout' from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new' from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/lib/etchclient.rb:438:in `process_until_done' from /usr/lib/ruby/gems/1.8/gems/etch-3.15.2/bin/etch:99 from /usr/bin/etch:19:in `load' from /usr/bin/etch:19* * Has anyone seen this error before, or know what may be causing it, or some things I can look for? Thanks! ** -- Kenneth Williams |
From: Jason H. <jh...@ap...> - 2010-01-21 05:24:25
|
Some post-release testing of the 3.15.0 release turned up a few problems with the client Rakefile which prevented it from building client packages. Those have been fixed and the latest version is now 3.15.2. Jason |
From: Jason H. <jh...@ap...> - 2010-01-21 01:10:39
|
The most significant change in this release is that the client now stores history logs as individual files rather than in RCS. RCS resulted in a very cumbersome UI for viewing the history logs, making it unlikely the history log would be used. Individual files allows for easy viewing and inspection using standard Unix tools like ls, diff, grep, etc. History logs in the old RCS format are converted automatically to the new format. Add any exception message to the message that is reported to the etch server. Otherwise we just report a non-zero status with no message, which is difficult to debug. We were printing the exception message to stderr, but most users run etch in a cron job that sends stderr to /dev/null. Change based on feedback from Kenneth Williams. Added a rake task for cleaning old entries out of the database per suggestion from Kenneth Williams. Also added a link in the web UI to delete a client. I removed the auto-update cron jobs from the RPM and Solaris packages. They were a bit of a holdover from the way we run things in our environment, but it would be very unusual for publicly-released software to operate that way. We'll leave it up to end users to auto-update etch appropriately for their environment. I'm also slightly changing the version numbering in this release to major.minor.patch. Previously I was using major.minor. I'm trying to make etch available via various packaging systems to simplify installation (gems, MacPorts, etc.) and some of those expect the major.minor.patch format. Thus the jump from 3.13 to 3.15.0. Jason |
From: Jason H. <jh...@ap...> - 2010-01-21 00:09:20
|
On Nov 14, 2009, at 10:56 AM, Jason Heiss wrote: > And I've used that flexibility to make a port for MacPorts. Running 'rake macport' produces a Portfile suitable for MacPorts. It has been submitted for inclusion in the official MacPorts tree. Hopefully soon installing etch on a Mac will be as simple as "port install etch". I'll send out an email when I get feedback on the MacPorts submission. I forgot to send out email, but the MacPorts submission was accepted a while ago. Jason |
From: Jason H. <jh...@ap...> - 2010-01-08 04:05:24
|
Sounds like SQLite isn't playing well with multiple unicorn processes. I've only used SQLite for development, but did a bit of reading about using SQLite in production. The basic recommendation seems to be to increase the timeout setting in the production section of database.yml. Only one process can have the database file open for writing at one time, any other process trying to open it for writing has to wait. The default is 5000 ms (5s). You might try cranking it up to 15000 or 20000 (15-20s) and see if that helps. Folks generally seem to think SQLite can handle a fair bit of traffic, but if bumping the timeout up doesn't work you might consider switching to MySQL or the like. We use MySQL. Obviously not quite a trivial to set up as SQLite, but it has worked pretty flawlessly for us. Jason On Jan 6, 2010, at 11:30 AM, Kenneth Williams wrote: > I'm seeing a ton of "SQLite3::BusyException" errors followed by a 500 internal server error in the logs. Nothing else that stands out though, unless I'm missing something. Would you like to see the trace output that follows this error? Also, I'm curious if sqlite is a good option long term? I've never used it before, usually sticking to MySQL or Oracle instead. Thanks again for your help on this. > > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata > SQLite3::BusyException: database is locked: UPDATE "facts" SET "value" = '1235105', "updated_at" = '2010-01-06 01:50:03' WHERE "id" = 2336 > 500 "Internal Server Error" > Error submitting results: > <html xmlns="http://www.w3.org/1999/xhtml"> > <head> > <title>Action Controller: Exception caught</title> > <style> > body { background-color: #fff; color: #333; } > > body, p, ol, ul, td { > font-family: verdana, arial, helvetica, sans-serif; > font-size: 13px; > line-height: 18px; > } > > pre { > background-color: #eee; > padding: 10px; > font-size: 11px; > } > > a { color: #000; } > a:visited { color: #666; } > a:hover { color: #fff; background-color:#000; } > </style> > </head> > <body> > > <h1> > ActiveRecord::StatementInvalid > > in ResultsController#create > > </h1> > <pre>SQLite3::BusyException: database is locked: UPDATE "clients" SET "updated_at" = '2010-01-06 01:50:03' WHERE "id" = 35</pre> > > > On Tue, Jan 5, 2010 at 5:55 PM, Jason Heiss <jh...@ap...> wrote: > Ah, right, sorry. If the client's status is good (0) and nothing needed to be changed then there will also be 0 results. > > Jason > > On Jan 5, 2010, at 5:38 PM, Kenneth Williams wrote: > >> I was under the impression that sometimes 0 results was okay, if for instance, if there where no files that needed to be updated? Like this host for example has a status of 0, even though it received 0 results on the last run: >> >> Client: web073 >> >> Name: web073 >> Status: 0 >> >> Time Hours Ago # of Results Total Message Size >> View 2010-01-06 01:30:36 UTC 0 0 0 >> View 2010-01-06 00:30:36 UTC 1 2 2754 >> View 2010-01-05 23:30:36 UTC 2 0 0 >> >> Cron is logging on 10 hosts, I'll check back on it after it's had time to gather some logs. Thanks for the help! >> >> On Tue, Jan 5, 2010 at 5:19 PM, Jason Heiss <jh...@ap...> wrote: >> Yeah, so that confirms a connection error. Since the client couldn't connect to the server it did not receive any configuration for any files, and thus submitted 0 results, just an overall status message indicating the failure. Hopefully capturing the output from the cron job will be informative. >> >> Jason >> >> On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote: >> >>> Yeah the status value is 1 for the broken clients. What's really strange is the most recent entries in the time line links are empty or successful, no failures. For example, here's the most recent one... >>> >>> Client: web077 >>> >>> Name: web077 >>> Status: 1 >>> >>> Time Hours Ago # of Results Total Message Size >>> View 2010-01-05 23:41:44 UTC 0 0 0 >>> View 2010-01-05 22:41:44 UTC 1 0 0 >>> View 2010-01-05 21:41:44 UTC 2 1 1226 >>> View 2010-01-05 20:41:44 UTC 3 2 3294 >>> View 2010-01-05 19:41:44 UTC 4 0 0 >>> View 2010-01-05 18:41:44 UTC 5 0 0 >>> >>> If I click the view link for hours 0 or 1 I get "We could not find any results in the system for that search.", the only time something shows up is when I put a change on my etch server, like with hours 2 or 3, and those look fine: >>> >>> Results >>> View all these combined >>> Client File Time Success Message Size >>> View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 21:20:50 UTC true 1226 >>> >>> I'll change my crontab to log output instead of /dev/null and see what I get. >>> >>> Thanks for the additional info about how yours is setup, it's helpful to know I've got this mostly setup the way it should be ;) >>> >>> On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: >>> What status value do the broken clients report? 1? >>> >>> If you're looking at a broken client in the web UI, is the "Message" field empty? If so, click the "24 hrs" timeline link, then the "0" hours ago "View" link, do any of the files show "false" in the "Success" column? >>> >>> The client will return a non-zero status to the server if it encounters any form of Ruby exception while processing. This would be failure to connect to the etch server or some error processing the configuration data sent by the server. In your case some sort of error connecting to the server seems most likely, although interestingly those clients are able to connect to report their results. Looking over the code, it seems like currently the message associated with any sort of connection error is printed to stderr, but not sent to the server. In which case you'd have "broken" clients with a status of 1 but no message. Is your cron job sending stdout/stderr to /dev/null? You might try letting a few clients email that to you or dump it to a file to see if you can catch the error. >>> >>> I'll modify the client code to add the exception message to the message sent to the server. >>> >>> FWIW, we run unicorn with 20 workers in our production environment. Behind nginx, although as you indicated the front-end web proxy doesn't seem to make a difference. >>> >>> I concur that the warning from facter is likely unrelated. >>> >>> Jason >>> >>> On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: >>> >>>> Hi all! >>>> >>>> I've started moving out of my test environment and beginning to move to production use. As part of that I've gone from using unicorn with one worker to testing four workers and an Apache proxy. Everything seems to work, and scales better when deploying to more hosts as you'd expect, but the etch dashboard reports hosts as broken using this setup. I've tested it in various combinations, using just unicorn without apache and multiple workers directly, and with apache using multiple masters with only one worker. The only setup I can get working without hosts being listed as broken is one master with one worker. Unfortunately, and as you could probably guess, it takes an eternity to push changes using only one worker once you throw in more than just a couple hosts... Apache as a proxy does not seem to make a difference, accessing unicorn through it's own port, or through the Apache proxy has no noticeable change in the number of broken hosts. In the end I'd like Apache to proxy to multiple unicorn masters on different hosts, but right now I'd settle for being able to have more than one worker running ;) >>>> >>>> The list of "broken" hosts steadily increases over the day at around the ten minute interval when etch client kicks off from cron. It starts off with just a few in a pool of 40 hosts listed as broken and goes up from there by one or two hosts every ten minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the hosts will alternate at the ten minute interval. If I put a change in my etch source directory it does get pushed out to the hosts, even the ones listed as broken, and if I log into a broken host and run etch manually it runs fine, except for two warnings. When running etch client manually it removes the host from the broken list, only to add it back in later. I've always ignored the warning because it did not seem to have any impact under the previous test setup. It seemed to have cropped up when I upgraded from 3.11 to the ruby gem 3.13 version. There are two hosts still running the 3.11 client that don't produce this warning, but they're also subject to being listed as broken along with the others. Just in case its important, the warning is: >>>> >>>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? >>>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata >>>> >>>> I don't think this is related to my problem though.The etch client command I'm running that produces this is: >>>> >>>> /usr/bin/etch --generate-all --server http://etch:8080/ >>>> >>>> Otherwise there are no errors produced by the etch client. Port 8080 is running through the Apache proxy, behind it is currently only one unicorn master with 20 workers. I'm running etch client version 3.13 on the nodes, and on the server I'm running 3.11. Please let me know if you need any additional details, any help is truly appreciated.Thanks!! >>>> >>>> -- >>>> Kenneth Williams >>>> ------------------------------------------------------------------------------ >>>> This SF.Net email is sponsored by the Verizon Developer Community >>>> Take advantage of Verizon's best-in-class app development support >>>> A streamlined, 14 day to market process makes app distribution fast and easy >>>> Join now and get one step closer to millions of Verizon customers >>>> http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ >>>> etch-users mailing list >>>> etc...@li... >>>> https://lists.sourceforge.net/lists/listinfo/etch-users >>> >>> >>> >>> >>> -- >>> Kenneth Williams >> >> >> >> >> -- >> Kenneth Williams <www.krw.info> >> No man's life, liberty, or property are safe while the legislature is in session. - Mark Twain > > > > > -- > Kenneth Williams <www.krw.info> > No man's life, liberty, or property are safe while the legislature is in session. - Mark Twain |
From: Kenneth W. <hap...@gm...> - 2010-01-06 19:30:17
|
I'm seeing a ton of "SQLite3::BusyException" errors followed by a 500 internal server error in the logs. Nothing else that stands out though, unless I'm missing something. Would you like to see the trace output that follows this error? Also, I'm curious if sqlite is a good option long term? I've never used it before, usually sticking to MySQL or Oracle instead. Thanks again for your help on this. /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata SQLite3::BusyException: database is locked: UPDATE "facts" SET "value" = '1235105', "updated_at" = '2010-01-06 01:50:03' WHERE "id" = 2336 500 "Internal Server Error" Error submitting results: <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Action Controller: Exception caught</title> <style> body { background-color: #fff; color: #333; } body, p, ol, ul, td { font-family: verdana, arial, helvetica, sans-serif; font-size: 13px; line-height: 18px; } pre { background-color: #eee; padding: 10px; font-size: 11px; } a { color: #000; } a:visited { color: #666; } a:hover { color: #fff; background-color:#000; } </style> </head> <body> <h1> ActiveRecord::StatementInvalid in ResultsController#create </h1> <pre>SQLite3::BusyException: database is locked: UPDATE "clients" SET "updated_at" = '2010-01-06 01:50:03' WHERE "id" = 35</pre> On Tue, Jan 5, 2010 at 5:55 PM, Jason Heiss <jh...@ap...> wrote: > Ah, right, sorry. If the client's status is good (0) and nothing needed to > be changed then there will also be 0 results. > > Jason > > On Jan 5, 2010, at 5:38 PM, Kenneth Williams wrote: > > I was under the impression that sometimes 0 results was okay, if for > instance, if there where no files that needed to be updated? Like this host > for example has a status of 0, even though it received 0 results on the last > run: > > Client: web073 > > Name: web073 > Status: 0 > > Time Hours Ago # of Results Total Message Size > View 2010-01-06 01:30:36 UTC 0 0 0 > View 2010-01-06 00:30:36 UTC 1 2 2754 > View 2010-01-05 23:30:36 UTC 2 0 0 > > Cron is logging on 10 hosts, I'll check back on it after it's had time to > gather some logs. Thanks for the help! > > On Tue, Jan 5, 2010 at 5:19 PM, Jason Heiss <jh...@ap...> wrote: > >> Yeah, so that confirms a connection error. Since the client couldn't >> connect to the server it did not receive any configuration for any files, >> and thus submitted 0 results, just an overall status message indicating the >> failure. Hopefully capturing the output from the cron job will be >> informative. >> >> Jason >> >> On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote: >> >> Yeah the status value is 1 for the broken clients. What's really strange >> is the most recent entries in the time line links are empty or successful, >> no failures. For example, here's the most recent one... >> >> Client: web077 >> >> Name: web077 >> Status: 1 >> >> Time Hours Ago # of Results Total Message Size >> View 2010-01-05 23:41:44 UTC 0 0 0 >> View 2010-01-05 22:41:44 UTC 1 0 0 >> View 2010-01-05 21:41:44 UTC 2 1 1226 >> View 2010-01-05 20:41:44 UTC 3 2 3294 >> View 2010-01-05 19:41:44 UTC 4 0 0 >> View 2010-01-05 18:41:44 UTC 5 0 0 >> >> If I click the view link for hours 0 or 1 I get "We could not find any >> results in the system for that search.", the only time something shows up is >> when I put a change on my etch server, like with hours 2 or 3, and those >> look fine: >> >> Results >> View all these combined >> Client File Time Success Message Size >> View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 21:20:50 >> UTC true 1226 >> >> I'll change my crontab to log output instead of /dev/null and see what I >> get. >> >> Thanks for the additional info about how yours is setup, it's helpful to >> know I've got this mostly setup the way it should be ;) >> >> On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: >> >>> What status value do the broken clients report? 1? >>> >>> If you're looking at a broken client in the web UI, is the "Message" >>> field empty? If so, click the "24 hrs" timeline link, then the "0" hours >>> ago "View" link, do any of the files show "false" in the "Success" column? >>> >>> The client will return a non-zero status to the server if it encounters >>> any form of Ruby exception while processing. This would be failure to >>> connect to the etch server or some error processing the configuration data >>> sent by the server. In your case some sort of error connecting to the >>> server seems most likely, although interestingly those clients are able to >>> connect to report their results. Looking over the code, it seems like >>> currently the message associated with any sort of connection error is >>> printed to stderr, but not sent to the server. In which case you'd have >>> "broken" clients with a status of 1 but no message. Is your cron job >>> sending stdout/stderr to /dev/null? You might try letting a few clients >>> email that to you or dump it to a file to see if you can catch the error. >>> >>> I'll modify the client code to add the exception message to the message >>> sent to the server. >>> >>> FWIW, we run unicorn with 20 workers in our production environment. >>> Behind nginx, although as you indicated the front-end web proxy doesn't >>> seem to make a difference. >>> >>> I concur that the warning from facter is likely unrelated. >>> >>> Jason >>> >>> On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: >>> >>> Hi all! >>> >>> I've started moving out of my test environment and beginning to move to >>> production use. As part of that I've gone from using unicorn with one worker >>> to testing four workers and an Apache proxy. Everything seems to work, and >>> scales better when deploying to more hosts as you'd expect, but the etch >>> dashboard reports hosts as broken using this setup. I've tested it in >>> various combinations, using just unicorn without apache and multiple workers >>> directly, and with apache using multiple masters with only one worker. The >>> only setup I can get working without hosts being listed as broken is one >>> master with one worker. Unfortunately, and as you could probably guess, it >>> takes an eternity to push changes using only one worker once you throw in >>> more than just a couple hosts... Apache as a proxy does not seem to make a >>> difference, accessing unicorn through it's own port, or through the Apache >>> proxy has no noticeable change in the number of broken hosts. In the end I'd >>> like Apache to proxy to multiple unicorn masters on different hosts, but >>> right now I'd settle for being able to have more than one worker running ;) >>> >>> The list of "broken" hosts steadily increases over the day at around the >>> ten minute interval when etch client kicks off from cron. It starts off with >>> just a few in a pool of 40 hosts listed as broken and goes up from there by >>> one or two hosts every ten minutes. It seems to stop around 25 +/- 3 >>> "broken" hosts, and the hosts will alternate at the ten minute interval. If >>> I put a change in my etch source directory it does get pushed out to the >>> hosts, even the ones listed as broken, and if I log into a broken host and >>> run etch manually it runs fine, except for two warnings. When running etch >>> client manually it removes the host from the broken list, only to add it >>> back in later. I've always ignored the warning because it did not seem to >>> have any impact under the previous test setup. It seemed to have cropped up >>> when I upgraded from 3.11 to the ruby gem 3.13 version. There are two hosts >>> still running the 3.11 client that don't produce this warning, but they're >>> also subject to being listed as broken along with the others. Just in case >>> its important, the warning is: >>> >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; >>> discarding old can_connect? >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; >>> discarding old metadata >>> >>> I don't think this is related to my problem though.The etch client >>> command I'm running that produces this is: >>> >>> /usr/bin/etch --generate-all --server http://etch:8080/ >>> >>> Otherwise there are no errors produced by the etch client. Port 8080 is >>> running through the Apache proxy, behind it is currently only one unicorn >>> master with 20 workers. I'm running etch client version 3.13 on the nodes, >>> and on the server I'm running 3.11. Please let me know if you need any >>> additional details, any help is truly appreciated.Thanks!! >>> >>> -- >>> Kenneth Williams >>> >>> ------------------------------------------------------------------------------ >>> This SF.Net email is sponsored by the Verizon Developer Community >>> Take advantage of Verizon's best-in-class app development support >>> A streamlined, 14 day to market process makes app distribution fast and >>> easy >>> Join now and get one step closer to millions of Verizon customers >>> http://p.sf.net/sfu/verizon-dev2dev_______________________________________________ >>> etch-users mailing list >>> etc...@li... >>> https://lists.sourceforge.net/lists/listinfo/etch-users >>> >>> >>> >> >> >> -- >> Kenneth Williams >> >> >> > > > -- > Kenneth Williams <www.krw.info> > No man's life, liberty, or property are safe while the legislature is in > session. - Mark Twain > > > -- Kenneth Williams <www.krw.info> No man's life, liberty, or property are safe while the legislature is in session. - Mark Twain |
From: Kenneth W. <hap...@gm...> - 2010-01-06 02:03:42
|
I was under the impression that sometimes 0 results was okay, if for instance, if there where no files that needed to be updated? Like this host for example has a status of 0, even though it received 0 results on the last run: Client: web073 Name: web073 Status: 0 Time Hours Ago # of Results Total Message Size View 2010-01-06 01:30:36 UTC 0 0 0 View 2010-01-06 00:30:36 UTC 1 2 2754 View 2010-01-05 23:30:36 UTC 2 0 0 Cron is logging on 10 hosts, I'll check back on it after it's had time to gather some logs. Thanks for the help! On Tue, Jan 5, 2010 at 5:19 PM, Jason Heiss <jh...@ap...> wrote: > Yeah, so that confirms a connection error. Since the client couldn't > connect to the server it did not receive any configuration for any files, > and thus submitted 0 results, just an overall status message indicating the > failure. Hopefully capturing the output from the cron job will be > informative. > > Jason > > On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote: > > Yeah the status value is 1 for the broken clients. What's really strange is > the most recent entries in the time line links are empty or successful, no > failures. For example, here's the most recent one... > > Client: web077 > > Name: web077 > Status: 1 > > Time Hours Ago # of Results Total Message Size > View 2010-01-05 23:41:44 UTC 0 0 0 > View 2010-01-05 22:41:44 UTC 1 0 0 > View 2010-01-05 21:41:44 UTC 2 1 1226 > View 2010-01-05 20:41:44 UTC 3 2 3294 > View 2010-01-05 19:41:44 UTC 4 0 0 > View 2010-01-05 18:41:44 UTC 5 0 0 > > If I click the view link for hours 0 or 1 I get "We could not find any > results in the system for that search.", the only time something shows up is > when I put a change on my etch server, like with hours 2 or 3, and those > look fine: > > Results > View all these combined > Client File Time Success Message Size > View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 21:20:50 > UTC true 1226 > > I'll change my crontab to log output instead of /dev/null and see what I > get. > > Thanks for the additional info about how yours is setup, it's helpful to > know I've got this mostly setup the way it should be ;) > > On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: > >> What status value do the broken clients report? 1? >> >> If you're looking at a broken client in the web UI, is the "Message" field >> empty? If so, click the "24 hrs" timeline link, then the "0" hours ago >> "View" link, do any of the files show "false" in the "Success" column? >> >> The client will return a non-zero status to the server if it encounters >> any form of Ruby exception while processing. This would be failure to >> connect to the etch server or some error processing the configuration data >> sent by the server. In your case some sort of error connecting to the >> server seems most likely, although interestingly those clients are able to >> connect to report their results. Looking over the code, it seems like >> currently the message associated with any sort of connection error is >> printed to stderr, but not sent to the server. In which case you'd have >> "broken" clients with a status of 1 but no message. Is your cron job >> sending stdout/stderr to /dev/null? You might try letting a few clients >> email that to you or dump it to a file to see if you can catch the error. >> >> I'll modify the client code to add the exception message to the message >> sent to the server. >> >> FWIW, we run unicorn with 20 workers in our production environment. >> Behind nginx, although as you indicated the front-end web proxy doesn't >> seem to make a difference. >> >> I concur that the warning from facter is likely unrelated. >> >> Jason >> >> On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: >> >> Hi all! >> >> I've started moving out of my test environment and beginning to move to >> production use. As part of that I've gone from using unicorn with one worker >> to testing four workers and an Apache proxy. Everything seems to work, and >> scales better when deploying to more hosts as you'd expect, but the etch >> dashboard reports hosts as broken using this setup. I've tested it in >> various combinations, using just unicorn without apache and multiple workers >> directly, and with apache using multiple masters with only one worker. The >> only setup I can get working without hosts being listed as broken is one >> master with one worker. Unfortunately, and as you could probably guess, it >> takes an eternity to push changes using only one worker once you throw in >> more than just a couple hosts... Apache as a proxy does not seem to make a >> difference, accessing unicorn through it's own port, or through the Apache >> proxy has no noticeable change in the number of broken hosts. In the end I'd >> like Apache to proxy to multiple unicorn masters on different hosts, but >> right now I'd settle for being able to have more than one worker running ;) >> >> The list of "broken" hosts steadily increases over the day at around the >> ten minute interval when etch client kicks off from cron. It starts off with >> just a few in a pool of 40 hosts listed as broken and goes up from there by >> one or two hosts every ten minutes. It seems to stop around 25 +/- 3 >> "broken" hosts, and the hosts will alternate at the ten minute interval. If >> I put a change in my etch source directory it does get pushed out to the >> hosts, even the ones listed as broken, and if I log into a broken host and >> run etch manually it runs fine, except for two warnings. When running etch >> client manually it removes the host from the broken list, only to add it >> back in later. I've always ignored the warning because it did not seem to >> have any impact under the previous test setup. It seemed to have cropped up >> when I upgraded from 3.11 to the ruby gem 3.13 version. There are two hosts >> still running the 3.11 client that don't produce this warning, but they're >> also subject to being listed as broken along with the others. Just in case >> its important, the warning is: >> >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; >> discarding old can_connect? >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; >> discarding old metadata >> >> I don't think this is related to my problem though.The etch client command >> I'm running that produces this is: >> >> /usr/bin/etch --generate-all --server http://etch:8080/ >> >> Otherwise there are no errors produced by the etch client. Port 8080 is >> running through the Apache proxy, behind it is currently only one unicorn >> master with 20 workers. I'm running etch client version 3.13 on the nodes, >> and on the server I'm running 3.11. Please let me know if you need any >> additional details, any help is truly appreciated.Thanks!! >> >> -- >> Kenneth Williams >> >> ------------------------------------------------------------------------------ >> This SF.Net email is sponsored by the Verizon Developer Community >> Take advantage of Verizon's best-in-class app development support >> A streamlined, 14 day to market process makes app distribution fast and >> easy >> Join now and get one step closer to millions of Verizon customers >> http://p.sf.net/sfu/verizon-dev2dev_______________________________________________ >> etch-users mailing list >> etc...@li... >> https://lists.sourceforge.net/lists/listinfo/etch-users >> >> >> > > > -- > Kenneth Williams > > > -- Kenneth Williams <www.krw.info> No man's life, liberty, or property are safe while the legislature is in session. - Mark Twain |
From: Jason H. <jh...@ap...> - 2010-01-06 01:56:11
|
Ah, right, sorry. If the client's status is good (0) and nothing needed to be changed then there will also be 0 results. Jason On Jan 5, 2010, at 5:38 PM, Kenneth Williams wrote: > I was under the impression that sometimes 0 results was okay, if for > instance, if there where no files that needed to be updated? Like > this host for example has a status of 0, even though it received 0 > results on the last run: > > Client: web073 > > Name: web073 > Status: 0 > > Time Hours Ago # of Results Total Message Size > View 2010-01-06 01:30:36 UTC 0 0 0 > View 2010-01-06 00:30:36 UTC 1 2 2754 > View 2010-01-05 23:30:36 UTC 2 0 0 > > Cron is logging on 10 hosts, I'll check back on it after it's had > time to gather some logs. Thanks for the help! > > On Tue, Jan 5, 2010 at 5:19 PM, Jason Heiss <jh...@ap...> wrote: > Yeah, so that confirms a connection error. Since the client > couldn't connect to the server it did not receive any configuration > for any files, and thus submitted 0 results, just an overall status > message indicating the failure. Hopefully capturing the output from > the cron job will be informative. > > Jason > > On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote: > >> Yeah the status value is 1 for the broken clients. What's really >> strange is the most recent entries in the time line links are empty >> or successful, no failures. For example, here's the most recent >> one... >> >> Client: web077 >> >> Name: web077 >> Status: 1 >> >> Time Hours Ago # of Results Total Message Size >> View 2010-01-05 23:41:44 UTC 0 0 0 >> View 2010-01-05 22:41:44 UTC 1 0 0 >> View 2010-01-05 21:41:44 UTC 2 1 1226 >> View 2010-01-05 20:41:44 UTC 3 2 3294 >> View 2010-01-05 19:41:44 UTC 4 0 0 >> View 2010-01-05 18:41:44 UTC 5 0 0 >> >> If I click the view link for hours 0 or 1 I get "We could not find >> any results in the system for that search.", the only time >> something shows up is when I put a change on my etch server, like >> with hours 2 or 3, and those look fine: >> >> Results >> View all these combined >> Client File Time Success Message Size >> View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 >> 21:20:50 UTC true 1226 >> >> I'll change my crontab to log output instead of /dev/null and see >> what I get. >> >> Thanks for the additional info about how yours is setup, it's >> helpful to know I've got this mostly setup the way it should be ;) >> >> On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: >> What status value do the broken clients report? 1? >> >> If you're looking at a broken client in the web UI, is the >> "Message" field empty? If so, click the "24 hrs" timeline link, >> then the "0" hours ago "View" link, do any of the files show >> "false" in the "Success" column? >> >> The client will return a non-zero status to the server if it >> encounters any form of Ruby exception while processing. This would >> be failure to connect to the etch server or some error processing >> the configuration data sent by the server. In your case some sort >> of error connecting to the server seems most likely, although >> interestingly those clients are able to connect to report their >> results. Looking over the code, it seems like currently the >> message associated with any sort of connection error is printed to >> stderr, but not sent to the server. In which case you'd have >> "broken" clients with a status of 1 but no message. Is your cron >> job sending stdout/stderr to /dev/null? You might try letting a >> few clients email that to you or dump it to a file to see if you >> can catch the error. >> >> I'll modify the client code to add the exception message to the >> message sent to the server. >> >> FWIW, we run unicorn with 20 workers in our production >> environment. Behind nginx, although as you indicated the front-end >> web proxy doesn't seem to make a difference. >> >> I concur that the warning from facter is likely unrelated. >> >> Jason >> >> On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: >> >>> Hi all! >>> >>> I've started moving out of my test environment and beginning to >>> move to production use. As part of that I've gone from using >>> unicorn with one worker to testing four workers and an Apache >>> proxy. Everything seems to work, and scales better when deploying >>> to more hosts as you'd expect, but the etch dashboard reports >>> hosts as broken using this setup. I've tested it in various >>> combinations, using just unicorn without apache and multiple >>> workers directly, and with apache using multiple masters with only >>> one worker. The only setup I can get working without hosts being >>> listed as broken is one master with one worker. Unfortunately, and >>> as you could probably guess, it takes an eternity to push changes >>> using only one worker once you throw in more than just a couple >>> hosts... Apache as a proxy does not seem to make a difference, >>> accessing unicorn through it's own port, or through the Apache >>> proxy has no noticeable change in the number of broken hosts. In >>> the end I'd like Apache to proxy to multiple unicorn masters on >>> different hosts, but right now I'd settle for being able to have >>> more than one worker running ;) >>> >>> The list of "broken" hosts steadily increases over the day at >>> around the ten minute interval when etch client kicks off from >>> cron. It starts off with just a few in a pool of 40 hosts listed >>> as broken and goes up from there by one or two hosts every ten >>> minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the >>> hosts will alternate at the ten minute interval. If I put a change >>> in my etch source directory it does get pushed out to the hosts, >>> even the ones listed as broken, and if I log into a broken host >>> and run etch manually it runs fine, except for two warnings. When >>> running etch client manually it removes the host from the broken >>> list, only to add it back in later. I've always ignored the >>> warning because it did not seem to have any impact under the >>> previous test setup. It seemed to have cropped up when I upgraded >>> from 3.11 to the ruby gem 3.13 version. There are two hosts still >>> running the 3.11 client that don't produce this warning, but >>> they're also subject to being listed as broken along with the >>> others. Just in case its important, the warning is: >>> >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method >>> redefined; discarding old can_connect? >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method >>> redefined; discarding old metadata >>> >>> I don't think this is related to my problem though.The etch client >>> command I'm running that produces this is: >>> >>> /usr/bin/etch --generate-all --server http://etch:8080/ >>> >>> Otherwise there are no errors produced by the etch client. Port >>> 8080 is running through the Apache proxy, behind it is currently >>> only one unicorn master with 20 workers. I'm running etch client >>> version 3.13 on the nodes, and on the server I'm running 3.11. >>> Please let me know if you need any additional details, any help is >>> truly appreciated.Thanks!! >>> >>> -- >>> Kenneth Williams >>> ------------------------------------------------------------------------------ >>> This SF.Net email is sponsored by the Verizon Developer Community >>> Take advantage of Verizon's best-in-class app development support >>> A streamlined, 14 day to market process makes app distribution >>> fast and easy >>> Join now and get one step closer to millions of Verizon customers >>> http://p.sf.net/sfu/verizon-dev2dev >>> _______________________________________________ >>> etch-users mailing list >>> etc...@li... >>> https://lists.sourceforge.net/lists/listinfo/etch-users >> >> >> >> >> -- >> Kenneth Williams > > > > > -- > Kenneth Williams <www.krw.info> > No man's life, liberty, or property are safe while the legislature > is in session. - Mark Twain |
From: Jason H. <jh...@ap...> - 2010-01-06 01:19:51
|
Yeah, so that confirms a connection error. Since the client couldn't connect to the server it did not receive any configuration for any files, and thus submitted 0 results, just an overall status message indicating the failure. Hopefully capturing the output from the cron job will be informative. Jason On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote: > Yeah the status value is 1 for the broken clients. What's really > strange is the most recent entries in the time line links are empty > or successful, no failures. For example, here's the most recent one... > > Client: web077 > > Name: web077 > Status: 1 > > Time Hours Ago # of Results Total Message Size > View 2010-01-05 23:41:44 UTC 0 0 0 > View 2010-01-05 22:41:44 UTC 1 0 0 > View 2010-01-05 21:41:44 UTC 2 1 1226 > View 2010-01-05 20:41:44 UTC 3 2 3294 > View 2010-01-05 19:41:44 UTC 4 0 0 > View 2010-01-05 18:41:44 UTC 5 0 0 > > If I click the view link for hours 0 or 1 I get "We could not find > any results in the system for that search.", the only time something > shows up is when I put a change on my etch server, like with hours 2 > or 3, and those look fine: > > Results > View all these combined > Client File Time Success Message Size > View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 > 21:20:50 UTC true 1226 > > I'll change my crontab to log output instead of /dev/null and see > what I get. > > Thanks for the additional info about how yours is setup, it's > helpful to know I've got this mostly setup the way it should be ;) > > On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: > What status value do the broken clients report? 1? > > If you're looking at a broken client in the web UI, is the "Message" > field empty? If so, click the "24 hrs" timeline link, then the "0" > hours ago "View" link, do any of the files show "false" in the > "Success" column? > > The client will return a non-zero status to the server if it > encounters any form of Ruby exception while processing. This would > be failure to connect to the etch server or some error processing > the configuration data sent by the server. In your case some sort > of error connecting to the server seems most likely, although > interestingly those clients are able to connect to report their > results. Looking over the code, it seems like currently the message > associated with any sort of connection error is printed to stderr, > but not sent to the server. In which case you'd have "broken" > clients with a status of 1 but no message. Is your cron job sending > stdout/stderr to /dev/null? You might try letting a few clients > email that to you or dump it to a file to see if you can catch the > error. > > I'll modify the client code to add the exception message to the > message sent to the server. > > FWIW, we run unicorn with 20 workers in our production environment. > Behind nginx, although as you indicated the front-end web proxy > doesn't seem to make a difference. > > I concur that the warning from facter is likely unrelated. > > Jason > > On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: > >> Hi all! >> >> I've started moving out of my test environment and beginning to >> move to production use. As part of that I've gone from using >> unicorn with one worker to testing four workers and an Apache >> proxy. Everything seems to work, and scales better when deploying >> to more hosts as you'd expect, but the etch dashboard reports hosts >> as broken using this setup. I've tested it in various combinations, >> using just unicorn without apache and multiple workers directly, >> and with apache using multiple masters with only one worker. The >> only setup I can get working without hosts being listed as broken >> is one master with one worker. Unfortunately, and as you could >> probably guess, it takes an eternity to push changes using only one >> worker once you throw in more than just a couple hosts... Apache as >> a proxy does not seem to make a difference, accessing unicorn >> through it's own port, or through the Apache proxy has no >> noticeable change in the number of broken hosts. In the end I'd >> like Apache to proxy to multiple unicorn masters on different >> hosts, but right now I'd settle for being able to have more than >> one worker running ;) >> >> The list of "broken" hosts steadily increases over the day at >> around the ten minute interval when etch client kicks off from >> cron. It starts off with just a few in a pool of 40 hosts listed as >> broken and goes up from there by one or two hosts every ten >> minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the >> hosts will alternate at the ten minute interval. If I put a change >> in my etch source directory it does get pushed out to the hosts, >> even the ones listed as broken, and if I log into a broken host and >> run etch manually it runs fine, except for two warnings. When >> running etch client manually it removes the host from the broken >> list, only to add it back in later. I've always ignored the warning >> because it did not seem to have any impact under the previous test >> setup. It seemed to have cropped up when I upgraded from 3.11 to >> the ruby gem 3.13 version. There are two hosts still running the >> 3.11 client that don't produce this warning, but they're also >> subject to being listed as broken along with the others. Just in >> case its important, the warning is: >> >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method >> redefined; discarding old can_connect? >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method >> redefined; discarding old metadata >> >> I don't think this is related to my problem though.The etch client >> command I'm running that produces this is: >> >> /usr/bin/etch --generate-all --server http://etch:8080/ >> >> Otherwise there are no errors produced by the etch client. Port >> 8080 is running through the Apache proxy, behind it is currently >> only one unicorn master with 20 workers. I'm running etch client >> version 3.13 on the nodes, and on the server I'm running 3.11. >> Please let me know if you need any additional details, any help is >> truly appreciated.Thanks!! >> >> -- >> Kenneth Williams >> ------------------------------------------------------------------------------ >> This SF.Net email is sponsored by the Verizon Developer Community >> Take advantage of Verizon's best-in-class app development support >> A streamlined, 14 day to market process makes app distribution fast >> and easy >> Join now and get one step closer to millions of Verizon customers >> http://p.sf.net/sfu/verizon-dev2dev >> _______________________________________________ >> etch-users mailing list >> etc...@li... >> https://lists.sourceforge.net/lists/listinfo/etch-users > > > > > -- > Kenneth Williams |
From: Kenneth W. <hap...@gm...> - 2010-01-06 00:48:28
|
Yeah the status value is 1 for the broken clients. What's really strange is the most recent entries in the time line links are empty or successful, no failures. For example, here's the most recent one... Client: web077 Name: web077 Status: 1 Time Hours Ago # of Results Total Message Size View 2010-01-05 23:41:44 UTC 0 0 0 View 2010-01-05 22:41:44 UTC 1 0 0 View 2010-01-05 21:41:44 UTC 2 1 1226 View 2010-01-05 20:41:44 UTC 3 2 3294 View 2010-01-05 19:41:44 UTC 4 0 0 View 2010-01-05 18:41:44 UTC 5 0 0 If I click the view link for hours 0 or 1 I get "We could not find any results in the system for that search.", the only time something shows up is when I put a change on my etch server, like with hours 2 or 3, and those look fine: Results View all these combined Client File Time Success Message Size View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 21:20:50 UTC true 1226 I'll change my crontab to log output instead of /dev/null and see what I get. Thanks for the additional info about how yours is setup, it's helpful to know I've got this mostly setup the way it should be ;) On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: > What status value do the broken clients report? 1? > > If you're looking at a broken client in the web UI, is the "Message" field > empty? If so, click the "24 hrs" timeline link, then the "0" hours ago > "View" link, do any of the files show "false" in the "Success" column? > > The client will return a non-zero status to the server if it encounters any > form of Ruby exception while processing. This would be failure to connect > to the etch server or some error processing the configuration data sent by > the server. In your case some sort of error connecting to the server seems > most likely, although interestingly those clients are able to connect to > report their results. Looking over the code, it seems like currently the > message associated with any sort of connection error is printed to stderr, > but not sent to the server. In which case you'd have "broken" clients with > a status of 1 but no message. Is your cron job sending stdout/stderr to > /dev/null? You might try letting a few clients email that to you or dump it > to a file to see if you can catch the error. > > I'll modify the client code to add the exception message to the message > sent to the server. > > FWIW, we run unicorn with 20 workers in our production environment. Behind > nginx, although as you indicated the front-end web proxy doesn't seem to > make a difference. > > I concur that the warning from facter is likely unrelated. > > Jason > > On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: > > Hi all! > > I've started moving out of my test environment and beginning to move to > production use. As part of that I've gone from using unicorn with one worker > to testing four workers and an Apache proxy. Everything seems to work, and > scales better when deploying to more hosts as you'd expect, but the etch > dashboard reports hosts as broken using this setup. I've tested it in > various combinations, using just unicorn without apache and multiple workers > directly, and with apache using multiple masters with only one worker. The > only setup I can get working without hosts being listed as broken is one > master with one worker. Unfortunately, and as you could probably guess, it > takes an eternity to push changes using only one worker once you throw in > more than just a couple hosts... Apache as a proxy does not seem to make a > difference, accessing unicorn through it's own port, or through the Apache > proxy has no noticeable change in the number of broken hosts. In the end I'd > like Apache to proxy to multiple unicorn masters on different hosts, but > right now I'd settle for being able to have more than one worker running ;) > > The list of "broken" hosts steadily increases over the day at around the > ten minute interval when etch client kicks off from cron. It starts off with > just a few in a pool of 40 hosts listed as broken and goes up from there by > one or two hosts every ten minutes. It seems to stop around 25 +/- 3 > "broken" hosts, and the hosts will alternate at the ten minute interval. If > I put a change in my etch source directory it does get pushed out to the > hosts, even the ones listed as broken, and if I log into a broken host and > run etch manually it runs fine, except for two warnings. When running etch > client manually it removes the host from the broken list, only to add it > back in later. I've always ignored the warning because it did not seem to > have any impact under the previous test setup. It seemed to have cropped up > when I upgraded from 3.11 to the ruby gem 3.13 version. There are two hosts > still running the 3.11 client that don't produce this warning, but they're > also subject to being listed as broken along with the others. Just in case > its important, the warning is: > > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; > discarding old can_connect? > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; > discarding old metadata > > I don't think this is related to my problem though.The etch client command > I'm running that produces this is: > > /usr/bin/etch --generate-all --server http://etch:8080/ > > Otherwise there are no errors produced by the etch client. Port 8080 is > running through the Apache proxy, behind it is currently only one unicorn > master with 20 workers. I'm running etch client version 3.13 on the nodes, > and on the server I'm running 3.11. Please let me know if you need any > additional details, any help is truly appreciated.Thanks!! > > -- > Kenneth Williams > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev_______________________________________________ > etch-users mailing list > etc...@li... > https://lists.sourceforge.net/lists/listinfo/etch-users > > > -- Kenneth Williams |
From: Jason H. <jh...@ap...> - 2010-01-04 21:38:19
|
What status value do the broken clients report? 1? If you're looking at a broken client in the web UI, is the "Message" field empty? If so, click the "24 hrs" timeline link, then the "0" hours ago "View" link, do any of the files show "false" in the "Success" column? The client will return a non-zero status to the server if it encounters any form of Ruby exception while processing. This would be failure to connect to the etch server or some error processing the configuration data sent by the server. In your case some sort of error connecting to the server seems most likely, although interestingly those clients are able to connect to report their results. Looking over the code, it seems like currently the message associated with any sort of connection error is printed to stderr, but not sent to the server. In which case you'd have "broken" clients with a status of 1 but no message. Is your cron job sending stdout/ stderr to /dev/null? You might try letting a few clients email that to you or dump it to a file to see if you can catch the error. I'll modify the client code to add the exception message to the message sent to the server. FWIW, we run unicorn with 20 workers in our production environment. Behind nginx, although as you indicated the front-end web proxy doesn't seem to make a difference. I concur that the warning from facter is likely unrelated. Jason On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: > Hi all! > > I've started moving out of my test environment and beginning to move > to production use. As part of that I've gone from using unicorn with > one worker to testing four workers and an Apache proxy. Everything > seems to work, and scales better when deploying to more hosts as > you'd expect, but the etch dashboard reports hosts as broken using > this setup. I've tested it in various combinations, using just > unicorn without apache and multiple workers directly, and with > apache using multiple masters with only one worker. The only setup I > can get working without hosts being listed as broken is one master > with one worker. Unfortunately, and as you could probably guess, it > takes an eternity to push changes using only one worker once you > throw in more than just a couple hosts... Apache as a proxy does not > seem to make a difference, accessing unicorn through it's own port, > or through the Apache proxy has no noticeable change in the number > of broken hosts. In the end I'd like Apache to proxy to multiple > unicorn masters on different hosts, but right now I'd settle for > being able to have more than one worker running ;) > > The list of "broken" hosts steadily increases over the day at around > the ten minute interval when etch client kicks off from cron. It > starts off with just a few in a pool of 40 hosts listed as broken > and goes up from there by one or two hosts every ten minutes. It > seems to stop around 25 +/- 3 "broken" hosts, and the hosts will > alternate at the ten minute interval. If I put a change in my etch > source directory it does get pushed out to the hosts, even the ones > listed as broken, and if I log into a broken host and run etch > manually it runs fine, except for two warnings. When running etch > client manually it removes the host from the broken list, only to > add it back in later. I've always ignored the warning because it did > not seem to have any impact under the previous test setup. It seemed > to have cropped up when I upgraded from 3.11 to the ruby gem 3.13 > version. There are two hosts still running the 3.11 client that > don't produce this warning, but they're also subject to being listed > as broken along with the others. Just in case its important, the > warning is: > > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method > redefined; discarding old can_connect? > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method > redefined; discarding old metadata > > I don't think this is related to my problem though.The etch client > command I'm running that produces this is: > > /usr/bin/etch --generate-all --server http://etch:8080/ > > Otherwise there are no errors produced by the etch client. Port 8080 > is running through the Apache proxy, behind it is currently only one > unicorn master with 20 workers. I'm running etch client version 3.13 > on the nodes, and on the server I'm running 3.11. Please let me know > if you need any additional details, any help is truly > appreciated.Thanks!! > > -- > Kenneth Williams > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast > and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > etch-users mailing list > etc...@li... > https://lists.sourceforge.net/lists/listinfo/etch-users |
From: Kenneth W. <hap...@gm...> - 2009-12-30 22:39:40
|
Hi all! I've started moving out of my test environment and beginning to move to production use. As part of that I've gone from using unicorn with one worker to testing four workers and an Apache proxy. Everything seems to work, and scales better when deploying to more hosts as you'd expect, but the etch dashboard reports hosts as broken using this setup. I've tested it in various combinations, using just unicorn without apache and multiple workers directly, and with apache using multiple masters with only one worker. The only setup I can get working without hosts being listed as broken is one master with one worker. Unfortunately, and as you could probably guess, it takes an eternity to push changes using only one worker once you throw in more than just a couple hosts... Apache as a proxy does not seem to make a difference, accessing unicorn through it's own port, or through the Apache proxy has no noticeable change in the number of broken hosts. In the end I'd like Apache to proxy to multiple unicorn masters on different hosts, but right now I'd settle for being able to have more than one worker running ;) The list of "broken" hosts steadily increases over the day at around the ten minute interval when etch client kicks off from cron. It starts off with just a few in a pool of 40 hosts listed as broken and goes up from there by one or two hosts every ten minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the hosts will alternate at the ten minute interval. If I put a change in my etch source directory it does get pushed out to the hosts, even the ones listed as broken, and if I log into a broken host and run etch manually it runs fine, except for two warnings. When running etch client manually it removes the host from the broken list, only to add it back in later. I've always ignored the warning because it did not seem to have any impact under the previous test setup. It seemed to have cropped up when I upgraded from 3.11 to the ruby gem 3.13 version. There are two hosts still running the 3.11 client that don't produce this warning, but they're also subject to being listed as broken along with the others. Just in case its important, the warning is: /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; discarding old can_connect? /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; discarding old metadata I don't think this is related to my problem though.The etch client command I'm running that produces this is: /usr/bin/etch --generate-all --server http://etch:8080/ Otherwise there are no errors produced by the etch client. Port 8080 is running through the Apache proxy, behind it is currently only one unicorn master with 20 workers. I'm running etch client version 3.13 on the nodes, and on the server I'm running 3.11. Please let me know if you need any additional details, any help is truly appreciated.Thanks!! -- Kenneth Williams |
From: Kenneth W. <hap...@gm...> - 2009-12-10 23:49:41
|
Cool, very nice. Thank you! This will work fine for now. I noticed the differences between the two as a result of some issues I was having with most of my linux machines not actually returning anything on the gethostbyid() call (well, it returns a string of seven zeros). As a temporary work around for now I'm taking the motherboard serial number from dmidecode and echoing it into /etc/hostid. This gets around an issue I was having with nventory not allowing a machine to register itself because another machine already existed with the same id. Etch does not seem to care, so I run it in post ks install and it does the dmidecode hack, then puts nventory into cron. So far this has not given me any issues aside from the stale etch entries, but I've only actually moved 30 or so machines to nventory so far. On Thu, Dec 10, 2009 at 3:15 PM, Jason Heiss <jh...@ap...> wrote: > Hmm, good question. We have a cron job here that deletes any clients that > haven't updated in 30 days directly via the database. So entries with > temporary hostnames and the like fall off the end eventually. But that's a > bit of an ugly hack to be recommending to folks. Among other things we have > to keep a database username and password lying around so the cron job can > authenticate. > > nVentory, for example, goes to a great deal of trouble to try to tie a > client entry to a physical box by using values that are unique to the > system's hardware (UUID from the motherboard or MAC addresses, for example). > Thus even if a client's name changes nVentory is able to keep updating the > same entry in the database. But that's a lot of complexity to add to etch. > > Would you be satisfied with just a simple way to manually or automatically > delete these stale entries with temporary host names? > > I've gone ahead and added a few options for deleting clients. These will > all be in the next release, including a link to delete a client when viewing > it via the web UI and some additions to the REST API to make scripting > deletions easier. However, the easiest one for you to incorporate before > the next release is a rake task. To add this to your current server: > > - Grab the following file and save it as lib/tasks/etch.rake within your > server directory: > http://etch.svn.sourceforge.net/viewvc/etch/trunk/server/lib/tasks/etch.rake > - Change to the server directory and run "rake etch:dbclean[24]", which > will remove any clients which haven't checked in within the last 24 hours. > You can adjust the number of hours as desired. > > Hope that helps. > > Jason > > On Dec 9, 2009, at 3:05 PM, Kenneth Williams wrote: > > Is there a recommended way to remove hosts? > > When I PXE boot a host I'm running etch immediately in the post install. > After I assign a role to the host it's host name and ip address change. So I > end up with these stale entries in etch from the first run. > > Is there a better way for me to handle this, or should I just delete those > entries? How are others handling this? Thanks! > > -- Kenneth Williams |
From: Jason H. <jh...@ap...> - 2009-12-10 23:15:31
|
Hmm, good question. We have a cron job here that deletes any clients that haven't updated in 30 days directly via the database. So entries with temporary hostnames and the like fall off the end eventually. But that's a bit of an ugly hack to be recommending to folks. Among other things we have to keep a database username and password lying around so the cron job can authenticate. nVentory, for example, goes to a great deal of trouble to try to tie a client entry to a physical box by using values that are unique to the system's hardware (UUID from the motherboard or MAC addresses, for example). Thus even if a client's name changes nVentory is able to keep updating the same entry in the database. But that's a lot of complexity to add to etch. Would you be satisfied with just a simple way to manually or automatically delete these stale entries with temporary host names? I've gone ahead and added a few options for deleting clients. These will all be in the next release, including a link to delete a client when viewing it via the web UI and some additions to the REST API to make scripting deletions easier. However, the easiest one for you to incorporate before the next release is a rake task. To add this to your current server: - Grab the following file and save it as lib/tasks/etch.rake within your server directory: http://etch.svn.sourceforge.net/viewvc/etch/trunk/server/lib/tasks/etch.rake - Change to the server directory and run "rake etch:dbclean[24]", which will remove any clients which haven't checked in within the last 24 hours. You can adjust the number of hours as desired. Hope that helps. Jason On Dec 9, 2009, at 3:05 PM, Kenneth Williams wrote: > Is there a recommended way to remove hosts? > > When I PXE boot a host I'm running etch immediately in the post > install. After I assign a role to the host it's host name and ip > address change. So I end up with these stale entries in etch from > the first run. > > Is there a better way for me to handle this, or should I just delete > those entries? How are others handling this? Thanks! |