Re: [Keepalived-devel] KL v1.1.11 on RH ES v4 U2: Simple 2 node Apache Farm

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed 22 Feb 2006 10:26:49 GMT , Shaun McCullagh 
<sha...@xb...> wrote:
<snip>
> Initially things looked very good, but both PCs
> crashed every few weeks.
<snip>
> As there is no kernel panic I'm not sure how to go
> about investigating this, any suggestions would be most welcome.

It sounds to me like the machines are running into some sort of RAM 
famine *or* a packet reflection issue. Unfortunately these problems can 
be quite hard to diagnose - and there's definitely not a silver bullet 
approach, either.

Are the crash timings predictable - ie. do they happen within a window 
of time on a specific day, after a number of days runtime, at a 
specific time of the month?

Can you install the "sar" package and then post-process the data to see 
what your systems are doing at (and prior to) the time of the failure?

Also: as you have a master/backup system which is using DR, you're 
effectively running two "localnode" servers. Are your machines getting 
trapped by reflecting packets back and forth to one another?
This can happen when server A gets a request to the VIP and forwards it 
to server B. Server B then forwards it back to server A, rather than 
getting an application to process it, because the "backup" LVS on 
server B catches the packet before the application. When the packet 
reaches server A again, there's an entry already in the LVS table for 
that connection going to server B, so the packet goes back to server B 
where the same thing happens. Repeat to fade...

You can work around this by using the netfilter "MARK" target, and 
configuring keepalived to use fwmarks instead of a VIP. Have a look at 
this thread for more details:

http://marc.theaimsgroup.com/?t=113862542800006&r=1&w=2

You may find something useful there, the OP did. Not quite what was 
intended, but a solution nonetheless.

Graeme