We have a system of Linux servers on a LAN, with a
redundant pair of L2 LAN switches and a redundant pair
of gateway NAT switches. Each server has two
interfaces, eth0 and eth1, configured with
active/standby channel bonding.
The Linux version is SUSE SLES8, with a kernel level of
2.4.21-138-smp #1 SMP.
I cannot get arp monitoring to work. Miimon works
great. I need to use the arp monitoring method because
I need to know and react to gateway switch failure, and
these switches are not adjacent to the servers -- the
layer 2 switches are the ethernet connection points for
the servers and are in between the NAT gateway switches
and the servers.
I use the following in the /etc/modules.conf file :
alias bond0 bonding
options bond0 mode=1 arp_interval=2000
arp_ip_target=172.16.1.250 miimon=0
I also have a file called /etc/init.d/rc3.d/S99local
that is executed on bootstrap and performs the
following commands:
/usr/bin/grep bonding /etc/modules.conf
if [[ $? == 0 ]]
then
MAC=$(/usr/lib/heartbeat/get_hw_addr eth0)
ifdown eth1
ifdown eth0
ifdown bond0
modprobe -r bonding
modprobe bonding mode=1 arp_interval=2000
arp_ip_target=172.16.1.250 miimon=0
ifconfig bond0 172.16.1.20 netmask
255.255.255.0 broadcast 172.16.1.255 up
ifconfig eth0 hw ether $MAC
ifconfig eth1 hw ether $MAC
/sbin/ifenslave bond0 eth0 eth1
fi
The IP of 172.16.1.250 is a virtual address that is
answered by the "active" NAT gateway switch. I see the
arp requests go out every two seconds to this address
and get the arp response. No problem.
Example of problem: eth0 active interface;
I fail the active NAT gateway switch, the virtual
address moves to the other switch and the only way to
reach it is via eth1, the arp requests fail to return
to the server, but the linux server never switches to
the other interface (eth1).
This appears to be a bug to me.
Logged In: YES
user_id=1095494
my login is "rcolvin"
email is "rcolvin@lucent.com"
Logged In: YES
user_id=1148798
I get the same problem.
I've done lots of tests, with different nics.
Disappointed.