From: Jason H. <jh...@ap...> - 2010-01-06 01:19:51
|
Yeah, so that confirms a connection error. Since the client couldn't connect to the server it did not receive any configuration for any files, and thus submitted 0 results, just an overall status message indicating the failure. Hopefully capturing the output from the cron job will be informative. Jason On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote: > Yeah the status value is 1 for the broken clients. What's really > strange is the most recent entries in the time line links are empty > or successful, no failures. For example, here's the most recent one... > > Client: web077 > > Name: web077 > Status: 1 > > Time Hours Ago # of Results Total Message Size > View 2010-01-05 23:41:44 UTC 0 0 0 > View 2010-01-05 22:41:44 UTC 1 0 0 > View 2010-01-05 21:41:44 UTC 2 1 1226 > View 2010-01-05 20:41:44 UTC 3 2 3294 > View 2010-01-05 19:41:44 UTC 4 0 0 > View 2010-01-05 18:41:44 UTC 5 0 0 > > If I click the view link for hours 0 or 1 I get "We could not find > any results in the system for that search.", the only time something > shows up is when I put a change on my etch server, like with hours 2 > or 3, and those look fine: > > Results > View all these combined > Client File Time Success Message Size > View web077 /etc/httpd/conf.d/vhosts.conf 2010-01-05 > 21:20:50 UTC true 1226 > > I'll change my crontab to log output instead of /dev/null and see > what I get. > > Thanks for the additional info about how yours is setup, it's > helpful to know I've got this mostly setup the way it should be ;) > > On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote: > What status value do the broken clients report? 1? > > If you're looking at a broken client in the web UI, is the "Message" > field empty? If so, click the "24 hrs" timeline link, then the "0" > hours ago "View" link, do any of the files show "false" in the > "Success" column? > > The client will return a non-zero status to the server if it > encounters any form of Ruby exception while processing. This would > be failure to connect to the etch server or some error processing > the configuration data sent by the server. In your case some sort > of error connecting to the server seems most likely, although > interestingly those clients are able to connect to report their > results. Looking over the code, it seems like currently the message > associated with any sort of connection error is printed to stderr, > but not sent to the server. In which case you'd have "broken" > clients with a status of 1 but no message. Is your cron job sending > stdout/stderr to /dev/null? You might try letting a few clients > email that to you or dump it to a file to see if you can catch the > error. > > I'll modify the client code to add the exception message to the > message sent to the server. > > FWIW, we run unicorn with 20 workers in our production environment. > Behind nginx, although as you indicated the front-end web proxy > doesn't seem to make a difference. > > I concur that the warning from facter is likely unrelated. > > Jason > > On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote: > >> Hi all! >> >> I've started moving out of my test environment and beginning to >> move to production use. As part of that I've gone from using >> unicorn with one worker to testing four workers and an Apache >> proxy. Everything seems to work, and scales better when deploying >> to more hosts as you'd expect, but the etch dashboard reports hosts >> as broken using this setup. I've tested it in various combinations, >> using just unicorn without apache and multiple workers directly, >> and with apache using multiple masters with only one worker. The >> only setup I can get working without hosts being listed as broken >> is one master with one worker. Unfortunately, and as you could >> probably guess, it takes an eternity to push changes using only one >> worker once you throw in more than just a couple hosts... Apache as >> a proxy does not seem to make a difference, accessing unicorn >> through it's own port, or through the Apache proxy has no >> noticeable change in the number of broken hosts. In the end I'd >> like Apache to proxy to multiple unicorn masters on different >> hosts, but right now I'd settle for being able to have more than >> one worker running ;) >> >> The list of "broken" hosts steadily increases over the day at >> around the ten minute interval when etch client kicks off from >> cron. It starts off with just a few in a pool of 40 hosts listed as >> broken and goes up from there by one or two hosts every ten >> minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the >> hosts will alternate at the ten minute interval. If I put a change >> in my etch source directory it does get pushed out to the hosts, >> even the ones listed as broken, and if I log into a broken host and >> run etch manually it runs fine, except for two warnings. When >> running etch client manually it removes the host from the broken >> list, only to add it back in later. I've always ignored the warning >> because it did not seem to have any impact under the previous test >> setup. It seemed to have cropped up when I upgraded from 3.11 to >> the ruby gem 3.13 version. There are two hosts still running the >> 3.11 client that don't produce this warning, but they're also >> subject to being listed as broken along with the others. Just in >> case its important, the warning is: >> >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method >> redefined; discarding old can_connect? >> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method >> redefined; discarding old metadata >> >> I don't think this is related to my problem though.The etch client >> command I'm running that produces this is: >> >> /usr/bin/etch --generate-all --server http://etch:8080/ >> >> Otherwise there are no errors produced by the etch client. Port >> 8080 is running through the Apache proxy, behind it is currently >> only one unicorn master with 20 workers. I'm running etch client >> version 3.13 on the nodes, and on the server I'm running 3.11. >> Please let me know if you need any additional details, any help is >> truly appreciated.Thanks!! >> >> -- >> Kenneth Williams >> ------------------------------------------------------------------------------ >> This SF.Net email is sponsored by the Verizon Developer Community >> Take advantage of Verizon's best-in-class app development support >> A streamlined, 14 day to market process makes app distribution fast >> and easy >> Join now and get one step closer to millions of Verizon customers >> http://p.sf.net/sfu/verizon-dev2dev >> _______________________________________________ >> etch-users mailing list >> etc...@li... >> https://lists.sourceforge.net/lists/listinfo/etch-users > > > > > -- > Kenneth Williams |