Re: [etch-users] problems with "broken" hosts when moving to production use

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Ah, right, sorry.  If the client's status is good (0) and nothing  
needed to be changed then there will also be 0 results.

Jason

On Jan 5, 2010, at 5:38 PM, Kenneth Williams wrote:

> I was under the impression that sometimes 0 results was okay, if for  
> instance, if there where no files that needed to be updated? Like  
> this host for example has a status of 0, even though it received 0  
> results on the last run:
>
> Client: web073
>
> Name: web073
> Status: 0
>
>       Time     Hours Ago     # of Results     Total Message Size
> View     2010-01-06 01:30:36 UTC     0     0     0
> View     2010-01-06 00:30:36 UTC     1     2     2754
> View     2010-01-05 23:30:36 UTC     2     0     0
>
> Cron is logging on 10 hosts, I'll check back on it after it's had  
> time to gather some logs. Thanks for the help!
>
> On Tue, Jan 5, 2010 at 5:19 PM, Jason Heiss <jh...@ap...> wrote:
> Yeah, so that confirms a connection error.  Since the client  
> couldn't connect to the server it did not receive any configuration  
> for any files, and thus submitted 0 results, just an overall status  
> message indicating the failure.  Hopefully capturing the output from  
> the cron job will be informative.
>
> Jason
>
> On Jan 5, 2010, at 4:20 PM, Kenneth Williams wrote:
>
>> Yeah the status value is 1 for the broken clients. What's really  
>> strange is the most recent entries in the time line links are empty  
>> or successful, no failures. For example, here's the most recent  
>> one...
>>
>> Client: web077
>>
>> Name: web077
>> Status: 1
>>
>>       Time     Hours Ago     # of Results     Total Message Size
>> View     2010-01-05 23:41:44 UTC     0     0     0
>> View     2010-01-05 22:41:44 UTC     1     0     0
>> View     2010-01-05 21:41:44 UTC     2     1     1226
>> View     2010-01-05 20:41:44 UTC     3     2     3294
>> View     2010-01-05 19:41:44 UTC     4     0     0
>> View     2010-01-05 18:41:44 UTC     5     0     0
>>
>> If I click the view link for hours 0 or 1 I get "We could not find  
>> any results in the system for that search.", the only time  
>> something shows up is when I put a change on my etch server, like  
>> with hours 2 or 3, and those look fine:
>>
>> Results
>> View all these combined
>>       Client     File     Time     Success     Message Size
>> View     web077     /etc/httpd/conf.d/vhosts.conf     2010-01-05  
>> 21:20:50 UTC     true     1226
>>
>> I'll change my crontab to log output instead of /dev/null and see  
>> what I get.
>>
>> Thanks for the additional info about how yours is setup, it's  
>> helpful to know I've got this mostly setup the way it should be ;)
>>
>> On Mon, Jan 4, 2010 at 1:37 PM, Jason Heiss <jh...@ap...> wrote:
>> What status value do the broken clients report?  1?
>>
>> If you're looking at a broken client in the web UI, is the  
>> "Message" field empty?  If so, click the "24 hrs" timeline link,  
>> then the "0" hours ago "View" link, do any of the files show  
>> "false" in the "Success" column?
>>
>> The client will return a non-zero status to the server if it  
>> encounters any form of Ruby exception while processing.  This would  
>> be failure to connect to the etch server or some error processing  
>> the configuration data sent by the server.  In your case some sort  
>> of error connecting to the server seems most likely, although  
>> interestingly those clients are able to connect to report their  
>> results.  Looking over the code, it seems like currently the  
>> message associated with any sort of connection error is printed to  
>> stderr, but not sent to the server.  In which case you'd have  
>> "broken" clients with a status of 1 but no message.  Is your cron  
>> job sending stdout/stderr to /dev/null?  You might try letting a  
>> few clients email that to you or dump it to a file to see if you  
>> can catch the error.
>>
>> I'll modify the client code to add the exception message to the  
>> message sent to the server.
>>
>> FWIW, we run unicorn with 20 workers in our production  
>> environment.  Behind nginx, although as you indicated the front-end  
>> web proxy doesn't seem to make a difference.
>>
>> I concur that the warning from facter is likely unrelated.
>>
>> Jason
>>
>> On Dec 30, 2009, at 2:39 PM, Kenneth Williams wrote:
>>
>>> Hi all!
>>>
>>> I've started moving out of my test environment and beginning to  
>>> move to production use. As part of that I've gone from using  
>>> unicorn with one worker to testing four workers and an Apache  
>>> proxy. Everything seems to work, and scales better when deploying  
>>> to more hosts as you'd expect, but the etch dashboard reports  
>>> hosts as broken using this setup. I've tested it in various  
>>> combinations, using just unicorn without apache and multiple  
>>> workers directly, and with apache using multiple masters with only  
>>> one worker. The only setup I can get working without hosts being  
>>> listed as broken is one master with one worker. Unfortunately, and  
>>> as you could probably guess, it takes an eternity to push changes  
>>> using only one worker once you throw in more than just a couple  
>>> hosts... Apache as a proxy does not seem to make a difference,  
>>> accessing unicorn through it's own port, or through the Apache  
>>> proxy has no noticeable change in the number of broken hosts. In  
>>> the end I'd like Apache to proxy to multiple unicorn masters on  
>>> different hosts, but right now I'd settle for being able to have  
>>> more than one worker running ;)
>>>
>>> The list of "broken" hosts steadily increases over the day at  
>>> around the ten minute interval when etch client kicks off from  
>>> cron. It starts off with just a few in a pool of 40 hosts listed  
>>> as broken and goes up from there by one or two hosts every ten  
>>> minutes. It seems to stop around 25 +/- 3 "broken" hosts, and the  
>>> hosts will alternate at the ten minute interval. If I put a change  
>>> in my etch source directory it does get pushed out to the hosts,  
>>> even the ones listed as broken, and if I log into a broken host  
>>> and run etch manually it runs fine, except for two warnings. When  
>>> running etch client manually it removes the host from the broken  
>>> list, only to add it back in later. I've always ignored the  
>>> warning because it did not seem to have any impact under the  
>>> previous test setup. It seemed to have cropped up when I upgraded  
>>> from 3.11 to the ruby gem 3.13 version. There are two hosts still  
>>> running the 3.11 client that don't produce this warning, but  
>>> they're also subject to being listed as broken along with the  
>>> others. Just in case its important, the warning is:
>>>
>>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:8: warning: method  
>>> redefined; discarding old can_connect?
>>> /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:16: warning: method  
>>> redefined; discarding old metadata
>>>
>>> I don't think this is related to my problem though.The etch client  
>>> command I'm running that produces this is:
>>>
>>> /usr/bin/etch --generate-all --server http://etch:8080/
>>>
>>> Otherwise there are no errors produced by the etch client. Port  
>>> 8080 is running through the Apache proxy, behind it is currently  
>>> only one unicorn master with 20 workers. I'm running etch client  
>>> version 3.13 on the nodes, and on the server I'm running 3.11.  
>>> Please let me know if you need any additional details, any help is  
>>> truly appreciated.Thanks!!
>>>
>>> -- 
>>> Kenneth Williams
>>> ------------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Verizon Developer Community
>>> Take advantage of Verizon's best-in-class app development support
>>> A streamlined, 14 day to market process makes app distribution  
>>> fast and easy
>>> Join now and get one step closer to millions of Verizon customers
>>> http://p.sf.net/sfu/verizon-dev2dev  
>>> _______________________________________________
>>> etch-users mailing list
>>> etc...@li...
>>> https://lists.sourceforge.net/lists/listinfo/etch-users
>>
>>
>>
>>
>> -- 
>> Kenneth Williams
>
>
>
>
> -- 
> Kenneth Williams <www.krw.info>
> No man's life, liberty, or property are safe while the legislature  
> is in session. - Mark Twain