From: Heiko Z. <he...@zu...> - 2004-09-09 01:34:55
|
Darren Spruell wrote: > We're hoping to set up a high-availability firewall using Devil-Linux > and are happy to see the heartbeat program included now. ;) > > Our setup is to be two identical firewall hosts which share a virtual > IP address. We want to have a primary and backup firewall host. All > traffic should flow through the primary, and upon failure, the backup > takes over the VIP. The VIP should always be resident on the active > host and the backup releases the VIP when the primary firewall comes > back online. We've tested this basic operation by stopping and > starting the heartbeat service on on the primary firewall and it seems > to work correctly. > > I think we are running into a problem when one of these hosts reboots, > though. If we power down the primary firewall, the backup successfully > takes over the VIP. But when the primary comes back online, it > apparently cannot regain the VIP because the generation number is > reset to 1 and the backup host is running a different generation > number. Here's the log message on the primary firewall when the > hearbeat service is started after it reboots: > > heartbeat: 2004/09/08_16:19:44 WARN: No Previous generation - starting > at 1 > heartbeat: 2004/09/08_16:19:44 info: Heartbeat generation: 1 > > And as it communicates with the backup firewall, the log message on > the backup: > > heartbeat: 2004/09/08_16:23:24 ERROR: should_drop_message: attempted > replay attack [viawest-test-fw-a.sento.com]? [gen = 1, curgen = 5] > > My understanding is that generation numbers are not preserved across > reboots and are off-synch if one host is powered off because of > devil-linux's volatile memory FS. I don't know if the normal heartbeat > operation is to preserve the generation in a file so that it is > persistent across reboots. I might not even be going down the right > path here. Any ideas? > I took a quick look at the linux-ha documenation and stumbled across this setting: > *auto_failback on* > > /Required./ For those familiar with Tru64 Unix, heartbeat acts as if > in "favored member" mode. The master listed in the haresources file > holds all the resources until a failover, at which time the slave > takes over. When /auto_failback/ is set to *on* once the master comes > back online, it will take everything back from the slave. When set to > *off* this option will prevent the master node from re-acquiring > cluster resources after a failover. This option is similar to to the > obsolete /nice_failback/ option. If you want to upgrade from a cluster > which had /nice_failback/ set *off*, to this or later versions, > special considerations apply in order to want to avoid requiring a > flash cut. Please see the FAQ > <http://linux-ha.org/download/faqnstuff.html> for details on how to > deal with this situation. > > -- Regards Heiko Zuerker http://www.devil-linux.org |