Linux Channel Bonding / Discussion / Help: arp monitoring not working in SUSE 8 2.4.21

richard colvin - 2004-08-01

lWe have a system of Linux servers on a LAN, with a redundant pair of L2 LAN switches and a redundant pair of gateway NAT switches. Each server has two interfaces, eth0 and eth1, configured with active/standby channel bonding.

The Linux version is SUSE SLES8, with a kernel level of 2.4.21-138-smp #1 SMP.

I cannot get arp monitoring to work. Miimon works great. I need to use the arp monitoring method because I need to know and react to gateway switch failure, and these switches are not adjacent to the servers -- the layer 2 switches are the ethernet connection points for the servers and are in between the NAT gateway switches and the servers.

I use the following in the /etc/modules.conf file :

alias bond0 bonding
options bond0 mode=1 arp_interval=2000 arp_ip_target=172.16.1.250 miimon=0

I also have a file called /etc/init.d/rc3.d/S99local that is executed on bootstrap and performs the following commands:

/usr/bin/grep bonding /etc/modules.conf
if [[ $? == 0 ]]
then
        MAC=$(/usr/lib/heartbeat/get_hw_addr eth0)
        ifdown eth1
        ifdown eth0
        ifdown bond0
        modprobe -r bonding
        modprobe bonding mode=1 arp_interval=2000 arp_ip_target=172.16.1.250 miimon=0
        ifconfig bond0 172.16.1.20 netmask 255.255.255.0 broadcast 172.16.1.255 up
        ifconfig eth0 hw ether $MAC
        ifconfig eth1 hw ether $MAC
        /sbin/ifenslave bond0 eth0 eth1
fi

The IP of 172.16.1.250 is a virtual address that is answered by the "active" NAT gateway switch. I see the arp requests go out every two seconds to this address and get the arp response. No problem.

But when I fail the active switch, the virtual address moves to the other switch and the only way to reach it is via eth1, the arp requests fail to return to the server, but the linux server never switches to the other interface (eth1).

This appears to be a bug to me.

Does this sound like a familiar scenario to you? Is this scenario fixed in version 2.4.22?

Thanks for any help you can provide.

Sincerely,

- Richard J. Colvin
Lucent Technologies
Columbus OH

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul Zirnik - 2004-08-04
  
  Have you checked that the slaves does't have routes by its own ?
  
  route -n should look like
  
  Destination     Gateway         Genmask         Flags   MSS Window irtt Iface
  10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 bond0
  127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo
  0.0.0.0         10.10.1.1      0.0.0.0         UG    0      0        0 bond0
  
  if you have routes for any slave drop them out of the routingtable.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Razvan Sultana - 2005-10-25
  
  Hi Richard,
  Did you managed to get your problem fixed?
  It seems the problem is still there.
  
  We have a similar system, but with Dell PowerEdge 1855 blades that run Fedora Core4 with kernel 2.6.11
  These blades come 10 in a chassis where you can have 2 switches, that are connected each to one of the two NICs on each of the 10 blades. The switches are connected each through an uplink to another general switch.
  
  I tried to configure a bonding interface for each blade in active-backup mode and ARP monitoring, with the IP of the gateway being monitored.
  This seems to work when I have just one blade - if I turn off the uplink on one switch, the bonding device switches to the other NIC (that goes out through the other switch).
  
  As soon as I add another blade to the equation, the
  bonding device gets confused and it doesn't behave as expected. Instead, it seems to detect the other link as being down. So, obviously, the ARP monitoring is unusable in this configuration.
  
  Any hints about why that happens?
  Is it a driver issue or a switch/network configuration issue? I don't have any routes associated directly with the NICs.
  
  Thank you,
  
  Razvan Sultana
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - honglonglong - 2006-11-26
    
    Hi,Sultana
    I encountered the same problem as you said.
    In our env we used two L2 switches,only one blade with bonding interface in active-backup mode and ARP monitoring can work fine,if I connected another blade to the two switches the bonding device will get confused and the arp monitoring could not work anymore.
    Have you resolve this problem?Is it an arp bug?
    thanks!
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

arp monitoring not working in SUSE 8 2.4.21

Forums

Help

arp monitoring not working in SUSE 8 2.4.21 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

arp monitoring not working in SUSE 8 2.4.21