From: Kenneth W. <hap...@gm...> - 2010-01-06 00:48:28
|
Yeah the status value is 1 for the broken clients. What's really strange is the most recent entries in the time line links are empty or successful, no failures. For example, here's the most recent one... Client: web077 Name: web077 Status: 1 Time Hours Ago # of Results Total Message Size View 2010-01-05 23:41:44 UTC 0 0 0 View 2010-01-05 22:41:44 UTC 1 0 0 View 2010-01-05 21:41:44 UTC 2 1 1226 View 2010-01-05 20:41:44 UTC 3 2 3294 View 2010-01-05 19:41:44 UTC 4 0 0 View 2010-01-05 18:41:44 UTC 5 0 0 If I click the view link for hours 0 or 1 I get "We could not find any results in the system for that search.", the only time something shows up is when I put a change on my etch server, like with hours 2 or 3, and those look fine: Results View all these combined Client File Time Success Message Size View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 21:20:50 UTC true 1226 I'll change my crontab to log output instead of /dev/null and see what I get. Thanks for the additional info about how yours is setup, it's helpful to know I've got this mostly setup the way it should be ;) On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: > What status value do the broken clients report? 1? > > If you're looking at a broken client in the web UI, is the "Message" field > empty? If so, click the "24 hrs" timeline link, then the "0" hours ago > "View" link, do any of the files show "false" in the "Success" column? > > The client will return a non-zero status to the server if it encounters any > form of Ruby exception while processing. This would be failure to connect > to the etch server or some error processing the configuration data sent by > the server. In your case some sort of error connecting to the server seems > most likely, although interestingly those clients are able to connect to > report their results. Looking over the code, it seems like currently the > message associated with any sort of connection error is printed to stderr, > but not sent to the server. In which case you'd have "broken" clients with > a status of 1 but no message. Is your cron job sending stdout/stderr to > /dev/null? You might try letting a few clients email that to you or dump it > to a file to see if you can catch the error. > > I'll modify the client code to add the exception message to the message > sent to the server. > > FWIW, we run unicorn with 20 workers in our production environment. Behind > nginx, although as you indicated the front-end web proxy doesn't seem to > make a difference. > > I concur that the warning from facter is likely unrelated. > > Jason > > On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: > > Hi all! > > I've started moving out of my test environment and beginning to move to > production use. As part of that I've gone from using unicorn with one worker > to testing four workers and an Apache proxy. Everything seems to work, and > scales better when deploying to more hosts as you'd expect, but the etch > dashboard reports hosts as broken using this setup. I've tested it in > various combinations, using just unicorn without apache and multiple workers > directly, and with apache using multiple masters with only one worker. The > only setup I can get working without hosts being listed as broken is one > master with one worker. Unfortunately, and as you could probably guess, it > takes an eternity to push changes using only one worker once you throw in > more than just a couple hosts... Apache as a proxy does not seem to make a > difference, accessing unicorn through it's own port, or through the Apache > proxy has no noticeable change in the number of broken hosts. In the end I'd > like Apache to proxy to multiple unicorn masters on different hosts, but > right now I'd settle for being able to have more than one worker running ;) > > The list of "broken" hosts steadily increases over the day at around the > ten minute interval when etch client kicks off from cron. It starts off with > just a few in a pool of 40 hosts listed as broken and goes up from there by > one or two hosts every ten minutes. It seems to stop around 25 +/- 3 > "broken" hosts, and the hosts will alternate at the ten minute interval. If > I put a change in my etch source directory it does get pushed out to the > hosts, even the ones listed as broken, and if I log into a broken host and > run etch manually it runs fine, except for two warnings. When running etch > client manually it removes the host from the broken list, only to add it > back in later. I've always ignored the warning because it did not seem to > have any impact under the previous test setup. It seemed to have cropped up > when I upgraded from 3.11 to the ruby gem 3.13 version. There are two hosts > still running the 3.11 client that don't produce this warning, but they're > also subject to being listed as broken along with the others. Just in case > its important, the warning is: > > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method redefined; > discarding old can_connect? > /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method redefined; > discarding old metadata > > I don't think this is related to my problem though.The etch client command > I'm running that produces this is: > > /usr/bin/etch --generate-all --server http://etch:8080/ > > Otherwise there are no errors produced by the etch client. Port 8080 is > running through the Apache proxy, behind it is currently only one unicorn > master with 20 workers. I'm running etch client version 3.13 on the nodes, > and on the server I'm running 3.11. Please let me know if you need any > additional details, any help is truly appreciated.Thanks!! > > -- > Kenneth Williams > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev_______________________________________________ > etch-users mailing list > etc...@li... > https://lists.sourceforge.net/lists/listinfo/etch-users > > > -- Kenneth Williams |