Arp Mon Flapping when it should be down

  • hk135

    hk135 - 2012-05-17

    Hi There

    I’m experiencing and issue where I have a bond with 2 cards connected to different switches (switch 1 and 3) which are in turn connected to another switch (switch 3) and I am using a target on the third switch for monitoring, this is to make sure of end to end connectivity.

    When everything is plugged in all is well and when I disconnect switch 1 or 2 from switch 3 the link goes down, for a little bit. Unfortunately the slave card connected to the disconnected switch keeps flapping between up and down for a few seconds and then sometimes stays either up or down.

    I reconfigured my bond to have only 1 nic in it and tried again, but the same thing happened. I ran tcpdump on both the slave and the bond and could not see any Arp traffic from my target (or at all as they were the only thing plugged into the switch at the time)

    I was wondering if anybody else had seen this, or knows if anything could be fighting with the Arp monitoring to give the up status. I’m using Debian 6 with the back ports kernel 3.2.0-0.bpo.2-amd64.

    Thanks in advance for any help

  • hk135

    hk135 - 2012-05-17

    Finally worked this out. It was due to another machine using bonding on the same segment for the same target. It seems that the bonding driver will interpret any arp for that ip (even another arp request) as a return.

    As this was a failover cluster and the interferance was from the inactive node I configured my cluster to take the bonding interfaces down when then were not in use and the problem cleared up.

  • Jay Vosburgh

    Jay Vosburgh - 2012-05-17

    Just as an FYI, the "arp_validate" option is meant to handle this case (that of multiple bonds on a network segment having each other's ARPs fool one another into thinking the path to the arp_ip_target is working).  With arp_validate enabled, only the ARP traffic from the bond itself (and the replies to it) counts for the purpose of determining link state.

    The documentation follows:


            Specifies whether or not ARP probes and replies should be
            validated in the active-backup mode.  This causes the ARP
            monitor to examine the incoming ARP requests and replies, and
            only consider a slave to be up if it is receiving the
            appropriate ARP traffic.

            Possible values are:

            none or 0

                    No validation is performed.  This is the default.

            active or 1

                    Validation is performed only for the active slave.

            backup or 2

                    Validation is performed only for backup slaves.

            all or 3

                    Validation is performed for all slaves.

            For the active slave, the validation checks ARP replies to
            confirm that they were generated by an arp_ip_target.  Since
            backup slaves do not typically receive these replies, the
            validation performed for backup slaves is on the ARP request
            sent out via the active slave.  It is possible that some
            switch or network configurations may result in situations
            wherein the backup slaves do not receive the ARP requests; in
            such a situation, validation of backup slaves must be

            This option is useful in network configurations in which
            multiple bonding hosts are concurrently issuing ARPs to one or
            more targets beyond a common switch.  Should the link between
            the switch and target fail (but not the switch itself), the
            probe traffic generated by the multiple bonding instances will
            fool the standard ARP monitor into considering the links as
            still up.  Use of the arp_validate option can resolve this, as
            the ARP monitor will only consider ARP requests and replies
            associated with its own instance of bonding.

            This option was added in bonding version 3.1.0.


  • hk135

    hk135 - 2012-05-18

    Thanks for your answer, unfortunately I was using balance-rr rather than active/backup mode (should have specifed) for which the arp_validate option is not applicable.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks