bond down because all interfaces in updelay

Help
maurerhjm
2006-03-04
2013-06-06
  • maurerhjm

    maurerhjm - 2006-03-04

    hi

    we are running  centos-4 2.6.9-22.0.2.ELsmp
    with two active-backup bonding interfaces in an HA environment with two switches.

    Because the switch takes some time from beeing operational even if a link up is
    shown, we had to add an updelay bonding module parameter.

    According to README.bonding the updelay parameter
    shoul not be used, if there are no active links.

    "Note that when a bonding interface has no active links, the
    driver will immediately reuse the first link that goes up, even if
    updelay parameter was specified.  If there are slave interfaces
    waiting for the updelay timeout to expire, the interface that first
    went into that state will be immediately reused.  This reduces down
    time of the network if the value of updelay has been overestimated."

    If I do an
    ifdown bond0
    ifup bond0

    both interfaces stay in the updelay status even if the link is up.
    and no network traffic is possible at all.

    see attached the logs and two modprobe.conf configurations which both lead to
    the same result.
    .

    Mar  2 08:57:04 cph2 kernel: Ethernet Channel Bonding Driver: v2.6.1 (October
    29, 2004)
    Mar  2 08:57:04 cph2 kernel: bonding: MII link monitoring set to 100 ms
    Mar  2 08:57:04 cph2 kernel: ip_tables: (C) 2000-2002 Netfilter core team
    Mar  2 08:57:06 cph2 kernel: bonding: bond0: enslaving eth0 as a backup
    interface with a down link.
    Mar  2 08:57:06 cph2 kernel: bonding: bond0: enslaving eth2 as a backup
    interface with a down link.
    Mar  2 08:57:08 cph2 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
    Mar  2 08:57:08 cph2 kernel: tg3: eth0: Flow control is off for TX and off for RX.
    Mar  2 08:57:08 cph2 kernel: bonding: bond0: link status up for interface eth0,
    enabling it in 300000 ms.
    Mar  2 08:57:08 cph2 kernel: e1000: eth2: e1000_watchdog: NIC Link is Up 1000
    Mbps Full Duplex
    Mar  2 08:57:08 cph2 kernel: bonding: bond0: link status up for interface eth2,
    enabling it in 300000 ms.
    Mar  2 09:02:09 cph2 kernel: bonding: bond0: link status definitely up for
    interface eth0.
    Mar  2 09:02:09 cph2 kernel: bonding: bond0: making interface eth0 the new
    active one.
    Mar  2 09:02:09 cph2 kernel: bonding: bond0: link status definitely up for
    interface eth2.

    install bond0 /sbin/modprobe --ignore-install -o bonding0 bonding miimon=1000
    downdelay=1000 updelay=30000 mode=active-backup primary=eth2
    remove bond0 /sbin/modprobe -r --ignore-remove bonding0
    install bond1 /sbin/modprobe --ignore-install -o bonding1 bonding miimon=100
    downdelay=1000 updelay=300000 mode=active-backup primary=eth1
    remove bond1 /sbin/modprobe -r --ignore-remove bonding1

    options bonding  mode=active-backup miimon=100 max_bonds=2 downdelay=1000
    updelay=300000
    alias bond0 bonding
    alias bond1 bonding

     
    • Daniel Johnson

      Daniel Johnson - 2007-07-18

      I'm having the same problem with a pair of e1000's (Intel 82546EB) in stock 2.6.21.5.  In our case the switches need ~2.5 minutes to fully boot, so I've got a 3min updelay.  That's fine for a switch rebooting, but annoying when I'm just booting the server.  I'm reading the driver source to see if I can find some typo to explain it, but that may take a while.  Does anyone know of a solution or explanation for this?

       
      • Daniel Johnson

        Daniel Johnson - 2007-07-18

        Upgraded to kernel v2.6.22.1 (bonding v3.1.3) so I'd be working with the latest-n-greatest code.  The server's problem went away, "updelay" no longer seems to affect it during boot.  I compiled the same kernel on my workstation and set up the same bonding (hey, why not?) and found the same problem again!

        My work-around for now is to start the bond with updelay=500, then "sleep 2; echo 180000 > /sys/class/net/bond0/bonding/updelay" after enslaving the physical links.  Works fine so far.

        This is bug ID 1443005.

         

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks