[Keepalived-devel] Re: Keepalived, UML and TONS of mail
Status: Beta
Brought to you by:
acassen
|
From: Alexandre C. <Ale...@wa...> - 2003-02-18 01:26:14
|
Hi Diego, Sorry for delay :/ I am quite busy currently... > I have recently setup a virtual cluster for my own testing purposes. > It's comprised of 2 directors and 2 realservers. > > The idea is to have active-passive failover on the directors, so that > service coming from the realservers is not interrupted. > > LVS and healthchecks would insure that only the available realservers > are used when requests come in. > > Although I can get it to work, I'm trying to use TCP_CHECK as the > healthcheck mechanism. Ok nice. > I'm exporting two virtual services: http and ssh, and my intention is to > test the availability of each service by doing the TCP_CHECK to each > corresponding port. > > However, I get tons of e-mail notifying me that "Realserver xxxx:yy > DOWN" and shortly thereafter "Relaserver xxxx:yy UP"...this goes on and > on... Yes this is because the final service (ssh, http) flap. This can be due to the fact that the delay_loop is too short... and your final service (ssh, http) seems to be flooded by healthcheck... 30s in your conf sound good... Strange... > I have attached my configuration - maybe you can tell me which of the > timeouts I have misconfigured, since I'm sure this is abnormal behavior. The most important is delay_loop it drives the healthcheck frequency... 30 sound good... tcp connection_timeout to 10 sounds good too... hmm there seems to have a trouble with your listener... hmmm... can be due to the fact the server are overloaded... > On a separate note: congratulations on a great product!! This is coming > in very handy in planning for the three clusters we need to implement! thanks :) > I'll be sure and forward you the details of the implementation so you > can use it in a "case studies"-type section in the website! Any documentations are very welcome... This is a part in the website that need to be expanded :) So, if you can write something, fill free, I will publish it on the website. > Also, is there any documentation that describes the timeouts, and how > they work in relation to each other? This I think is important, since > the existing docs (that I've seen) don't cover this. Yes... the only things : delay_loop is the frequency launching healthchecker... connection_timeout is the timeout considering service fail (driving the remove healtchecker removing decision). Best regards, Alexandre |