arp monitoring not working in SUSE 8 2.4.21

Help
2004-08-01
2013-06-06
  • richard colvin

    richard colvin - 2004-08-01

    lWe have a system of Linux servers on a LAN, with a redundant pair of L2 LAN switches and a redundant pair of gateway NAT switches. Each server has two interfaces, eth0 and eth1, configured with active/standby channel bonding.

    The Linux version is SUSE SLES8, with a kernel level of 2.4.21-138-smp #1 SMP.

    I cannot get arp monitoring to work. Miimon works great. I need to use the arp monitoring method because I need to know and react to gateway switch failure, and these switches are not adjacent to the servers -- the layer 2 switches are the ethernet connection points for the servers and are in between the NAT gateway switches and the servers.

    I use the following in the /etc/modules.conf file :

    alias bond0 bonding
    options bond0 mode=1 arp_interval=2000 arp_ip_target=172.16.1.250 miimon=0

    I also have a file called /etc/init.d/rc3.d/S99local that is executed on bootstrap and performs the following commands:

    /usr/bin/grep bonding /etc/modules.conf
    if [[ $? == 0 ]]
    then
            MAC=$(/usr/lib/heartbeat/get_hw_addr eth0)
            ifdown eth1
            ifdown eth0
            ifdown bond0
            modprobe -r bonding
            modprobe bonding mode=1 arp_interval=2000 arp_ip_target=172.16.1.250 miimon=0
            ifconfig bond0 172.16.1.20 netmask 255.255.255.0 broadcast 172.16.1.255 up
            ifconfig eth0 hw ether $MAC
            ifconfig eth1 hw ether $MAC
            /sbin/ifenslave bond0 eth0 eth1
    fi

    The IP of 172.16.1.250 is a virtual address that is answered by the "active" NAT gateway switch. I see the arp requests go out every two seconds to this address and get the arp response. No problem.

    But when I fail the active switch, the virtual address moves to the other switch and the only way to reach it is via eth1, the arp requests fail to return to the server, but the linux server never switches to the other interface (eth1).

    This appears to be a bug to me.

    Does this sound like a familiar scenario to you? Is this scenario fixed in version 2.4.22?

    Thanks for any help you can provide.

    Sincerely,

      - Richard J. Colvin
    Lucent Technologies
    Columbus OH

     
    • Paul Zirnik

      Paul Zirnik - 2004-08-04

      Have you checked that the slaves does't have routes by its own ?

      route -n should look like

      Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
      10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 bond0
      127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo
      0.0.0.0         10.10.1.1      0.0.0.0         UG    0      0        0 bond0

      if you have routes for any slave drop them out of the  routingtable.

       
    • Razvan Sultana

      Razvan Sultana - 2005-10-25

      Hi Richard,
      Did you  managed to get your problem fixed?
      It seems the problem is still there.

      We have a similar system, but with Dell PowerEdge 1855 blades that run Fedora Core4 with kernel 2.6.11
      These blades come 10 in a chassis where you can have 2 switches, that are connected each to one of the two NICs on each of the 10 blades. The switches are connected each through an uplink to another general switch.

      I tried to configure a bonding interface for each blade in active-backup mode and ARP monitoring, with the IP of the gateway being monitored.
      This seems to work when I have just one blade - if I turn off the uplink on one switch, the bonding device switches to the other NIC (that goes out through the other switch).

      As soon as I add another blade to the equation, the
      bonding device gets confused and it doesn't behave as expected. Instead, it seems to detect the other link as being down. So, obviously, the ARP monitoring is unusable in this configuration.

      Any hints about why that happens?
      Is it a driver issue or a switch/network configuration issue? I don't have any routes associated directly with the NICs.

      Thank you,

      Razvan Sultana

       
      • honglonglong

        honglonglong - 2006-11-26

        Hi,Sultana
        I encountered the same problem as you said.
        In our env we used two L2 switches,only one blade with bonding interface in active-backup mode and ARP monitoring can work fine,if I connected another blade to the two switches the  bonding device will get confused and the arp monitoring could not work anymore.
        Have you resolve this problem?Is it an arp bug?
        thanks!

         

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks