active-backup w/ARP broken at 2.6.18-308.8.1

Help
Dan Ragle
2012-06-01
2013-06-06
  • Dan Ragle
    Dan Ragle
    2012-06-01

    HI,

    After upgrading from 2.6.18-308.4.1 to 2.6.18-308.8.1 our active-backup bonded interface w/ARP monitoring is broken; both devices in the bond are always listed as down. The same configuration worked in the 2.6.18-308.4.1. This is a RHEL 5.8 box. Obfuscated data follows. Please let me know if there is something that I've missed or if there's other information that would be helpful in analyzing.

    # cat /etc/sysconfig/network-scripts/ifcfg-bond0
    DEVICE=bond0
    BOOTPROTO=none
    ONBOOT=yes
    IPADDR=1.2.3.8
    NETMASK=255.255.255.224
    USERCTL=no
    BONDING_OPTS="mode=1 arp_interval=250 arp_ip_target=1.2.3.1,1.2.3.2 arp_validate=3 primary=eth0"
    # cat /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    BOOTPROTO=static
    ONBOOT=yes
    HWADDR=01:02:03:04:05:06
    MASTER=bond0
    SLAVE=yes
    IPV6INIT=no
    USERCTL=no
    TYPE=Ethernet
    # cat /etc/sysconfig/network-scripts/ifcfg-eth1
    DEVICE=eth1
    BOOTPROTO=static
    ONBOOT=yes
    HWADDR=01:02:03:04:05:07
    MASTER=bond0
    SLAVE=yes
    IPV6INIT=no
    USERCTL=no
    TYPE=Ethernet
    # cat /etc/modprobe.conf
    alias net-pf-10 off
    alias scsi_hostadapter ata_piix
    alias eth1 e1000e
    alias eth2 tg3
    alias eth0 igb
    alias eth3 igb
    alias bond0 bonding
    alias bond1 bonding
    options bonding max_bonds=2
    # cat /proc/net/bonding/bond0
    Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: eth0 (primary_reselect always)
    Currently Active Slave: None
    MII Status: down
    MII Polling Interval (ms): 0
    Up Delay (ms): 0
    Down Delay (ms): 0
    ARP Polling Interval (ms): 250
    ARP IP target/s (n.n.n.n form): 1.2.3.1,1.2.3.2
    Slave Interface: eth0
    MII Status: down
    Speed: 100 Mbps
    Duplex: full
    Link Failure Count: 1
    Permanent HW addr: 01:02:03:04:05:06
    Slave Interface: eth1
    MII Status: down
    Speed: 100 Mbps
    Duplex: full
    Link Failure Count: 1
    Permanent HW addr: 01:02:03:04:05:07
    

    Prior to the upgrade, monitoring of ARP traffic looked like this:

      0.000000 IntelCor_04:05:06 -> Broadcast    ARP Who has 1.2.3.1?  Tell 1.2.3.8
      0.000009 IntelCor_04:05:06 -> Broadcast    ARP Who has 1.2.3.2?  Tell 1.2.3.8
      0.000042 All-HSRP-routers_64 -> IntelCor_04:05:06 ARP 1.2.3.1 is at 09:08:07:06:05:04
      0.000285 Cisco_08:09:0A -> IntelCor_04:05:06 ARP 1.2.3.2 is at 09:08:07:06:05:03
    

    After the upgrade, it looks like this:

      0.000000 IntelCor_04:05:06 -> Broadcast    ARP Who has 67.211.164.1?  Tell 0.0.0.0
      0.000010 IntelCor_04:05:06 -> Broadcast    ARP Who has 67.211.164.2?  Tell 0.0.0.0
      0.249834 IntelCor_04:05:06 -> Broadcast    ARP Who has 67.211.164.1?  Tell 0.0.0.0
      0.249846 IntelCor_04:05:06 -> Broadcast    ARP Who has 67.211.164.2?  Tell 0.0.0.0
    

    etc, with no responses.

    Thanks in advance for any assistance!

    Dan Ragle

     
  • Dan Ragle
    Dan Ragle
    2012-06-01

    FYI, bond1 is a different pair of NICs on the same box in a MII setup:

    # cat /etc/sysconfig/network-scripts/ifcfg-bond1
    DEVICE=bond1
    BOOTPROTO=static
    IPADDR=192.168.1.4
    NETMASK=255.255.255.0
    ONBOOT=yes
    IPV6INIT=no
    USERCTL=no
    BONDING_OPTS="mode=1 miimon=100 primary=eth3 updelay=30000"