Re: [etch-users] problems with "broken" hosts when moving to production use

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yeah, so that confirms a connection error.  Since the client couldn't  
connect to the server it did not receive any configuration for any  
files, and thus submitted 0 results, just an overall status message  
indicating the failure.  Hopefully capturing the output from the cron  
job will be informative.

Jason

On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote:

> Yeah the status value is 1 for the broken clients. What's really  
> strange is the most recent entries in the time line links are empty  
> or successful, no failures. For example, here's the most recent one...
>
> Client: web077
>
> Name: web077
> Status: 1
>
>       Time     Hours Ago     # of Results     Total Message Size
> View     2010-01-05 23:41:44 UTC     0     0     0
> View     2010-01-05 22:41:44 UTC     1     0     0
> View     2010-01-05 21:41:44 UTC     2     1     1226
> View     2010-01-05 20:41:44 UTC     3     2     3294
> View     2010-01-05 19:41:44 UTC     4     0     0
> View     2010-01-05 18:41:44 UTC     5     0     0
>
> If I click the view link for hours 0 or 1 I get "We could not find  
> any results in the system for that search.", the only time something  
> shows up is when I put a change on my etch server, like with hours 2  
> or 3, and those look fine:
>
> Results
> View all these combined
>       Client     File     Time     Success     Message Size
> View     web077     /etc/httpd/conf.d/vhosts.conf     2010-01-05  
> 21:20:50 UTC     true     1226
>
> I'll change my crontab to log output instead of /dev/null and see  
> what I get.
>
> Thanks for the additional info about how yours is setup, it's  
> helpful to know I've got this mostly setup the way it should be ;)
>
> On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote:
> What status value do the broken clients report?  1?
>
> If you're looking at a broken client in the web UI, is the "Message"  
> field empty?  If so, click the "24 hrs" timeline link, then the "0"  
> hours ago "View" link, do any of the files show "false" in the  
> "Success" column?
>
> The client will return a non-zero status to the server if it  
> encounters any form of Ruby exception while processing.  This would  
> be failure to connect to the etch server or some error processing  
> the configuration data sent by the server.  In your case some sort  
> of error connecting to the server seems most likely, although  
> interestingly those clients are able to connect to report their  
> results.  Looking over the code, it seems like currently the message  
> associated with any sort of connection error is printed to stderr,  
> but not sent to the server.  In which case you'd have "broken"  
> clients with a status of 1 but no message.  Is your cron job sending  
> stdout/stderr to /dev/null?  You might try letting a few clients  
> email that to you or dump it to a file to see if you can catch the  
> error.
>
> I'll modify the client code to add the exception message to the  
> message sent to the server.
>
> FWIW, we run unicorn with 20 workers in our production environment.   
> Behind nginx, although as you indicated the front-end web proxy  
> doesn't seem to make a difference.
>
> I concur that the warning from facter is likely unrelated.
>
> Jason
>
> On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote:
>
>> Hi all!
>>
>> I've started moving out of my test environment and beginning to  
>> move to production use. As part of that I've gone from using  
>> unicorn with one worker to testing four workers and an Apache  
>> proxy. Everything seems to work, and scales better when deploying  
>> to more hosts as you'd expect, but the etch dashboard reports hosts  
>> as broken using this setup. I've tested it in various combinations,  
>> using just unicorn without apache and multiple workers directly,  
>> and with apache using multiple masters with only one worker. The  
>> only setup I can get working without hosts being listed as broken  
>> is one master with one worker. Unfortunately, and as you could  
>> probably guess, it takes an eternity to push changes using only one  
>> worker once you throw in more than just a couple hosts... Apache as  
>> a proxy does not seem to make a difference, accessing unicorn  
>> through it's own port, or through the Apache proxy has no  
>> noticeable change in the number of broken hosts. In the end I'd  
>> like Apache to proxy to multiple unicorn masters on different  
>> hosts, but right now I'd settle for being able to have more than  
>> one worker running ;)
>>
>> The list of "broken" hosts steadily increases over the day at  
>> around the ten minute interval when etch client kicks off from  
>> cron. It starts off with just a few in a pool of 40 hosts listed as  
>> broken and goes up from there by one or two hosts every ten  
>> minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the  
>> hosts will alternate at the ten minute interval. If I put a change  
>> in my etch source directory it does get pushed out to the hosts,  
>> even the ones listed as broken, and if I log into a broken host and  
>> run etch manually it runs fine, except for two warnings. When  
>> running etch client manually it removes the host from the broken  
>> list, only to add it back in later. I've always ignored the warning  
>> because it did not seem to have any impact under the previous test  
>> setup. It seemed to have cropped up when I upgraded from 3.11 to  
>> the ruby gem 3.13 version. There are two hosts still running the  
>> 3.11 client that don't produce this warning, but they're also  
>> subject to being listed as broken along with the others. Just in  
>> case its important, the warning is:
>>
>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method  
>> redefined; discarding old can_connect?
>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method  
>> redefined; discarding old metadata
>>
>> I don't think this is related to my problem though.The etch client  
>> command I'm running that produces this is:
>>
>> /usr/bin/etch --generate-all --server http://etch:8080/
>>
>> Otherwise there are no errors produced by the etch client. Port  
>> 8080 is running through the Apache proxy, behind it is currently  
>> only one unicorn master with 20 workers. I'm running etch client  
>> version 3.13 on the nodes, and on the server I'm running 3.11.  
>> Please let me know if you need any additional details, any help is  
>> truly appreciated.Thanks!!
>>
>> -- 
>> Kenneth Williams
>> ------------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Verizon Developer Community
>> Take advantage of Verizon's best-in-class app development support
>> A streamlined, 14 day to market process makes app distribution fast  
>> and easy
>> Join now and get one step closer to millions of Verizon customers
>> http://p.sf.net/sfu/verizon-dev2dev  
>> _______________________________________________
>> etch-users mailing list
>> etc...@li...
>> https://lists.sourceforge.net/lists/listinfo/etch-users
>
>
>
>
> -- 
> Kenneth Williams