lWe have a system of Linux servers on a LAN, with a redundant pair of L2 LAN switches and a redundant pair of gateway NAT switches. Each server has two interfaces, eth0 and eth1, configured with active/standby channel bonding.
The Linux version is SUSE SLES8, with a kernel level of 2.4.21-138-smp #1 SMP.
I cannot get arp monitoring to work. Miimon works great. I need to use the arp monitoring method because I need to know and react to gateway switch failure, and these switches are not adjacent to the servers -- the layer 2 switches are the ethernet connection points for the servers and are in between the NAT gateway switches and the servers.
I use the following in the /etc/modules.conf file :
alias bond0 bonding
options bond0 mode=1 arp_interval=2000 arp_ip_target=172.16.1.250 miimon=0
I also have a file called /etc/init.d/rc3.d/S99local that is executed on bootstrap and performs the following commands:
The IP of 172.16.1.250 is a virtual address that is answered by the "active" NAT gateway switch. I see the arp requests go out every two seconds to this address and get the arp response. No problem.
But when I fail the active switch, the virtual address moves to the other switch and the only way to reach it is via eth1, the arp requests fail to return to the server, but the linux server never switches to the other interface (eth1).
This appears to be a bug to me.
Does this sound like a familiar scenario to you? Is this scenario fixed in version 2.4.22?
Thanks for any help you can provide.
Sincerely,
- Richard J. Colvin
Lucent Technologies
Columbus OH
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Richard,
Did you managed to get your problem fixed?
It seems the problem is still there.
We have a similar system, but with Dell PowerEdge 1855 blades that run Fedora Core4 with kernel 2.6.11
These blades come 10 in a chassis where you can have 2 switches, that are connected each to one of the two NICs on each of the 10 blades. The switches are connected each through an uplink to another general switch.
I tried to configure a bonding interface for each blade in active-backup mode and ARP monitoring, with the IP of the gateway being monitored.
This seems to work when I have just one blade - if I turn off the uplink on one switch, the bonding device switches to the other NIC (that goes out through the other switch).
As soon as I add another blade to the equation, the
bonding device gets confused and it doesn't behave as expected. Instead, it seems to detect the other link as being down. So, obviously, the ARP monitoring is unusable in this configuration.
Any hints about why that happens?
Is it a driver issue or a switch/network configuration issue? I don't have any routes associated directly with the NICs.
Thank you,
Razvan Sultana
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,Sultana
I encountered the same problem as you said.
In our env we used two L2 switches,only one blade with bonding interface in active-backup mode and ARP monitoring can work fine,if I connected another blade to the two switches the bonding device will get confused and the arp monitoring could not work anymore.
Have you resolve this problem?Is it an arp bug?
thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
lWe have a system of Linux servers on a LAN, with a redundant pair of L2 LAN switches and a redundant pair of gateway NAT switches. Each server has two interfaces, eth0 and eth1, configured with active/standby channel bonding.
The Linux version is SUSE SLES8, with a kernel level of 2.4.21-138-smp #1 SMP.
I cannot get arp monitoring to work. Miimon works great. I need to use the arp monitoring method because I need to know and react to gateway switch failure, and these switches are not adjacent to the servers -- the layer 2 switches are the ethernet connection points for the servers and are in between the NAT gateway switches and the servers.
I use the following in the /etc/modules.conf file :
alias bond0 bonding
options bond0 mode=1 arp_interval=2000 arp_ip_target=172.16.1.250 miimon=0
I also have a file called /etc/init.d/rc3.d/S99local that is executed on bootstrap and performs the following commands:
/usr/bin/grep bonding /etc/modules.conf
if [[ $? == 0 ]]
then
MAC=$(/usr/lib/heartbeat/get_hw_addr eth0)
ifdown eth1
ifdown eth0
ifdown bond0
modprobe -r bonding
modprobe bonding mode=1 arp_interval=2000 arp_ip_target=172.16.1.250 miimon=0
ifconfig bond0 172.16.1.20 netmask 255.255.255.0 broadcast 172.16.1.255 up
ifconfig eth0 hw ether $MAC
ifconfig eth1 hw ether $MAC
/sbin/ifenslave bond0 eth0 eth1
fi
The IP of 172.16.1.250 is a virtual address that is answered by the "active" NAT gateway switch. I see the arp requests go out every two seconds to this address and get the arp response. No problem.
But when I fail the active switch, the virtual address moves to the other switch and the only way to reach it is via eth1, the arp requests fail to return to the server, but the linux server never switches to the other interface (eth1).
This appears to be a bug to me.
Does this sound like a familiar scenario to you? Is this scenario fixed in version 2.4.22?
Thanks for any help you can provide.
Sincerely,
- Richard J. Colvin
Lucent Technologies
Columbus OH
Have you checked that the slaves does't have routes by its own ?
route -n should look like
Destination Gateway Genmask Flags MSS Window irtt Iface
10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0
127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo
0.0.0.0 10.10.1.1 0.0.0.0 UG 0 0 0 bond0
if you have routes for any slave drop them out of the routingtable.
Hi Richard,
Did you managed to get your problem fixed?
It seems the problem is still there.
We have a similar system, but with Dell PowerEdge 1855 blades that run Fedora Core4 with kernel 2.6.11
These blades come 10 in a chassis where you can have 2 switches, that are connected each to one of the two NICs on each of the 10 blades. The switches are connected each through an uplink to another general switch.
I tried to configure a bonding interface for each blade in active-backup mode and ARP monitoring, with the IP of the gateway being monitored.
This seems to work when I have just one blade - if I turn off the uplink on one switch, the bonding device switches to the other NIC (that goes out through the other switch).
As soon as I add another blade to the equation, the
bonding device gets confused and it doesn't behave as expected. Instead, it seems to detect the other link as being down. So, obviously, the ARP monitoring is unusable in this configuration.
Any hints about why that happens?
Is it a driver issue or a switch/network configuration issue? I don't have any routes associated directly with the NICs.
Thank you,
Razvan Sultana
Hi,Sultana
I encountered the same problem as you said.
In our env we used two L2 switches,only one blade with bonding interface in active-backup mode and ARP monitoring can work fine,if I connected another blade to the two switches the bonding device will get confused and the arp monitoring could not work anymore.
Have you resolve this problem?Is it an arp bug?
thanks!