From: Jason H. <jh...@ap...> - 2010-01-06 01:56:11
|
Ah, right, sorry. If the client's status is good (0) and nothing needed to be changed then there will also be 0 results. Jason On Jan 5, 2010, at 5:38 PM, Kenneth Williams wrote: > I was under the impression that sometimes 0 results was okay, if for > instance, if there where no files that needed to be updated? Like > this host for example has a status of 0, even though it received 0 > results on the last run: > > Client: web073 > > Name: web073 > Status: 0 > > Time Hours Ago # of Results Total Message Size > View 2010-01-06 01:30:36 UTC 0 0 0 > View 2010-01-06 00:30:36 UTC 1 2 2754 > View 2010-01-05 23:30:36 UTC 2 0 0 > > Cron is logging on 10 hosts, I'll check back on it after it's had > time to gather some logs. Thanks for the help! > > On Tue, Jan 5, 2010 at 5:19 PM, Jason Heiss <jh...@ap...> wrote: > Yeah, so that confirms a connection error. Since the client > couldn't connect to the server it did not receive any configuration > for any files, and thus submitted 0 results, just an overall status > message indicating the failure. Hopefully capturing the output from > the cron job will be informative. > > Jason > > On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote: > >> Yeah the status value is 1 for the broken clients. What's really >> strange is the most recent entries in the time line links are empty >> or successful, no failures. For example, here's the most recent >> one... >> >> Client: web077 >> >> Name: web077 >> Status: 1 >> >> Time Hours Ago # of Results Total Message Size >> View 2010-01-05 23:41:44 UTC 0 0 0 >> View 2010-01-05 22:41:44 UTC 1 0 0 >> View 2010-01-05 21:41:44 UTC 2 1 1226 >> View 2010-01-05 20:41:44 UTC 3 2 3294 >> View 2010-01-05 19:41:44 UTC 4 0 0 >> View 2010-01-05 18:41:44 UTC 5 0 0 >> >> If I click the view link for hours 0 or 1 I get "We could not find >> any results in the system for that search.", the only time >> something shows up is when I put a change on my etch server, like >> with hours 2 or 3, and those look fine: >> >> Results >> View all these combined >> Client File Time Success Message Size >> View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 >> 21:20:50 UTC true 1226 >> >> I'll change my crontab to log output instead of /dev/null and see >> what I get. >> >> Thanks for the additional info about how yours is setup, it's >> helpful to know I've got this mostly setup the way it should be ;) >> >> On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: >> What status value do the broken clients report? 1? >> >> If you're looking at a broken client in the web UI, is the >> "Message" field empty? If so, click the "24 hrs" timeline link, >> then the "0" hours ago "View" link, do any of the files show >> "false" in the "Success" column? >> >> The client will return a non-zero status to the server if it >> encounters any form of Ruby exception while processing. This would >> be failure to connect to the etch server or some error processing >> the configuration data sent by the server. In your case some sort >> of error connecting to the server seems most likely, although >> interestingly those clients are able to connect to report their >> results. Looking over the code, it seems like currently the >> message associated with any sort of connection error is printed to >> stderr, but not sent to the server. In which case you'd have >> "broken" clients with a status of 1 but no message. Is your cron >> job sending stdout/stderr to /dev/null? You might try letting a >> few clients email that to you or dump it to a file to see if you >> can catch the error. >> >> I'll modify the client code to add the exception message to the >> message sent to the server. >> >> FWIW, we run unicorn with 20 workers in our production >> environment. Behind nginx, although as you indicated the front-end >> web proxy doesn't seem to make a difference. >> >> I concur that the warning from facter is likely unrelated. >> >> Jason >> >> On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: >> >>> Hi all! >>> >>> I've started moving out of my test environment and beginning to >>> move to production use. As part of that I've gone from using >>> unicorn with one worker to testing four workers and an Apache >>> proxy. Everything seems to work, and scales better when deploying >>> to more hosts as you'd expect, but the etch dashboard reports >>> hosts as broken using this setup. I've tested it in various >>> combinations, using just unicorn without apache and multiple >>> workers directly, and with apache using multiple masters with only >>> one worker. The only setup I can get working without hosts being >>> listed as broken is one master with one worker. Unfortunately, and >>> as you could probably guess, it takes an eternity to push changes >>> using only one worker once you throw in more than just a couple >>> hosts... Apache as a proxy does not seem to make a difference, >>> accessing unicorn through it's own port, or through the Apache >>> proxy has no noticeable change in the number of broken hosts. In >>> the end I'd like Apache to proxy to multiple unicorn masters on >>> different hosts, but right now I'd settle for being able to have >>> more than one worker running ;) >>> >>> The list of "broken" hosts steadily increases over the day at >>> around the ten minute interval when etch client kicks off from >>> cron. It starts off with just a few in a pool of 40 hosts listed >>> as broken and goes up from there by one or two hosts every ten >>> minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the >>> hosts will alternate at the ten minute interval. If I put a change >>> in my etch source directory it does get pushed out to the hosts, >>> even the ones listed as broken, and if I log into a broken host >>> and run etch manually it runs fine, except for two warnings. When >>> running etch client manually it removes the host from the broken >>> list, only to add it back in later. I've always ignored the >>> warning because it did not seem to have any impact under the >>> previous test setup. It seemed to have cropped up when I upgraded >>> from 3.11 to the ruby gem 3.13 version. There are two hosts still >>> running the 3.11 client that don't produce this warning, but >>> they're also subject to being listed as broken along with the >>> others. Just in case its important, the warning is: >>> >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method >>> redefined; discarding old can_connect? >>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method >>> redefined; discarding old metadata >>> >>> I don't think this is related to my problem though.The etch client >>> command I'm running that produces this is: >>> >>> /usr/bin/etch --generate-all --server http://etch:8080/ >>> >>> Otherwise there are no errors produced by the etch client. Port >>> 8080 is running through the Apache proxy, behind it is currently >>> only one unicorn master with 20 workers. I'm running etch client >>> version 3.13 on the nodes, and on the server I'm running 3.11. >>> Please let me know if you need any additional details, any help is >>> truly appreciated.Thanks!! >>> >>> -- >>> Kenneth Williams >>> ------------------------------------------------------------------------------ >>> This SF.Net email is sponsored by the Verizon Developer Community >>> Take advantage of Verizon's best-in-class app development support >>> A streamlined, 14 day to market process makes app distribution >>> fast and easy >>> Join now and get one step closer to millions of Verizon customers >>> http://p.sf.net/sfu/verizon-dev2dev >>> _______________________________________________ >>> etch-users mailing list >>> etc...@li... >>> https://lists.sourceforge.net/lists/listinfo/etch-users >> >> >> >> >> -- >> Kenneth Williams > > > > > -- > Kenneth Williams <www.krw.info> > No man's life, liberty, or property are safe while the legislature > is in session. - Mark Twain |