We have 2xCisco 3750 switches connected together to form a single switch. Pairs of ports are configured as passive lacp on the switches (one on each physical switch).
The machines are using 2xIntel e1000e controllers.
Previously they had FreeBSD 7 on the boxes and when configured to LACP worked without issue. Pulling ports at random and even connecting them incorrectly caused no issues as long as at least one port was correctly connected.
During the install of RH 5.6 we had to reconnect port 0 to a 'normal' non-lacp configured switch port to get a network without setting up bonding during the install.
Once up I configured LACP bonding over both ports, even though one port was left on a non-LACP configured switch port.
On FreeBSD this functioned as expected. LACP negotiated that only one port was in the LACP bond and ignored the other.
However, with RH a ping shows that the network appears and disappears at regular intervals. This is only fixed by an ifconfig down on the port that is connected to a non-LACP port.
Why does this occur? I would've expected the same behaviour as FreeBSD. Connecting one port incorrectly should not break the network. After all bonding is done for reliability. The most reliable and logical outcome is to continue to function on the one good port?
Any ideas why this occurs? Let me know if you need any more information.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This precise scenario is a new one to me, but I suspect it may be the same root cause as a similar problem that I've been working on fixing. The bug is a problem in the linux 802.3ad implementation that manifests sometimes when multiple aggregators are connected simultaneously.
Can you reverse the order that the slaves are added to bonding? Right now, my guess is that your non-LACP switch port is added to the bond first, and then the other bonding slave becomes the active aggregator.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-04-13
Swapping the interface order in the ifenslave (putting the LACP configured switch port 1st) got a working setup.
Is this bug fixed in a later version of the bonding code than v3.4.0-1 ?
BTW is the latest bonding code really over 2yrs old?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The bug is not fixed anywhere, and exists in all versions of bonding. I've been working on a fix over the last couple of weeks, but it's one of those things that's simple in theory but not so simple to implement. The problem, basically, is that bonding uses the same MAC address for all aggregators; trouble happens if the slave that owns that MAC address ends up in an inactive aggregator. By switching the order of enslavement, you get the bond's MAC address from a slave that ends up in the active aggregator, so things work ok.
And, no, the most recent bonding code isn't two years old (the current version has had changes just a few days ago). The base version that Red Hat used is 3.4.0, from 2008. They've added various patches to it (hence the "-1" in the version), but didn't update the date. Even on the current version the date is in 2010; having a date really isn't all that useful, but we're kind of stuck with it. The version numbers are a little bit useful; on a distro kernel, they at least specify which base version of bonding was used, but it's usually necessary to go look at the source code anyway, because the version skews between distros (so a "3.4.0-1" on distro A isn't the same as "3.4.0-1" on distro B), and sometimes patches are added without the version number being changed or patches are taken piecemeal and so on.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-04-15
Thanks for the detailed reply.
I wondered if looking at how the *BSDs handle this might help? Please don't think I'm trying to teach you to suck eggs.
FreeBSD 7.3 on the same hardware with the same network config (port 0 on a non-LACP port and port 1 correctly on an LACP port) uses the MAC address of port 0 for both interfaces, but still works correctly:
# grep ^em /var/run/dmesg.boot
em0: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0xec00-0xec1f mem 0xdefe0000-0xdeffffff,0xdefc0000-0xdefdffff irq 33 at device 0.0 on pci8
em0: Using MSI interrupt
em0:
em0: Ethernet address: 00:1f:29:61:b1:bc
em1: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0xe880-0xe89f mem 0xdef80000-0xdef9ffff,0xdef60000-0xdef7ffff irq 33 at device 0.1 on pci8
em1: Using MSI interrupt
em1:
em1: Ethernet address: 00:1f:29:61:b1:bd
# ifconfig em0
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1f:29:61:b1:bc
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
lagg: laggdev lagg0
# ifconfig em1
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1f:29:61:b1:bc
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
lagg: laggdev lagg0
# ifconfig lagg0
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1f:29:61:b1:bc
inet 146.x.x.70 netmask 0xffffff80 broadcast 146.x.x.127
media: Ethernet autoselect
status: active
laggproto lacp
laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em0 flags=18<COLLECTING,DISTRIBUTING>
Perhaps the authors of that could shed some light on how they tackled it?
Thanks again for your help and for all the good work you've done on this. I hope you nail this issue.
Cheers.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The 802.3ad standard (802.1AX nowadays) requires that each aggregator have a unique MAC address. Linux doesn't do this, so when the slaves are split across two aggregators (e.g., if some are connected to one switch and some to another, where the switches are connected) the same MAC address may be sent as the source MAC on both aggregators. This confuses the switch.
Your case is similar; the bond gets its MAC from slave A (in your case, the "non-LACP port" slave). It assigns that MAC to all slaves. The active aggregator ends up being on slave B, using slave A's MAC for the aggregator. LACPDUs are periodically sent on slave A, also using its permanent MAC address (LACPDUs use the interface's permanent MAC, not the MAC used for the aggregator). When the LACPDU goes out, the switch updates its mac address table, and sends traffic destined for that MAC to slave A, which is dropped (non-control traffic inbound to inactive slaves is dropped to suppress duplicates). Once the active aggregator sends something (on slave B), the switch mac address table updates again, and voila, everything works again.
You can induce the behavior on a single switch if the ports form into multiple aggregators, either because they're grouped separately on the switch or because they're different speeds.
FreeBSD might be running both aggregators as active simultaneously, it accept traffic inbound to the non-active aggregator, it might not send LACPDUs on the non-active aggregator, it might be using a separate MAC address under the covers and not showing it in ifconfig, or it might be something else.
The solution for linux is to have each aggregator select a MAC for itself from one of its slaves.
This is tricky for older network cards, because some drivers do not permit altering the MAC address while the device is up. The recent (last three or four years, I'd say) drivers can all do this, so it's really an issue only for older devices.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Redhat 5.6
Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)
We have 2xCisco 3750 switches connected together to form a single switch. Pairs of ports are configured as passive lacp on the switches (one on each physical switch).
The machines are using 2xIntel e1000e controllers.
Previously they had FreeBSD 7 on the boxes and when configured to LACP worked without issue. Pulling ports at random and even connecting them incorrectly caused no issues as long as at least one port was correctly connected.
During the install of RH 5.6 we had to reconnect port 0 to a 'normal' non-lacp configured switch port to get a network without setting up bonding during the install.
Once up I configured LACP bonding over both ports, even though one port was left on a non-LACP configured switch port.
On FreeBSD this functioned as expected. LACP negotiated that only one port was in the LACP bond and ignored the other.
However, with RH a ping shows that the network appears and disappears at regular intervals. This is only fixed by an ifconfig down on the port that is connected to a non-LACP port.
Why does this occur? I would've expected the same behaviour as FreeBSD. Connecting one port incorrectly should not break the network. After all bonding is done for reliability. The most reliable and logical outcome is to continue to function on the one good port?
Any ideas why this occurs? Let me know if you need any more information.
Thanks.
This precise scenario is a new one to me, but I suspect it may be the same root cause as a similar problem that I've been working on fixing. The bug is a problem in the linux 802.3ad implementation that manifests sometimes when multiple aggregators are connected simultaneously.
Can you reverse the order that the slaves are added to bonding? Right now, my guess is that your non-LACP switch port is added to the bond first, and then the other bonding slave becomes the active aggregator.
Swapping the interface order in the ifenslave (putting the LACP configured switch port 1st) got a working setup.
Is this bug fixed in a later version of the bonding code than v3.4.0-1 ?
BTW is the latest bonding code really over 2yrs old?
The bug is not fixed anywhere, and exists in all versions of bonding. I've been working on a fix over the last couple of weeks, but it's one of those things that's simple in theory but not so simple to implement. The problem, basically, is that bonding uses the same MAC address for all aggregators; trouble happens if the slave that owns that MAC address ends up in an inactive aggregator. By switching the order of enslavement, you get the bond's MAC address from a slave that ends up in the active aggregator, so things work ok.
And, no, the most recent bonding code isn't two years old (the current version has had changes just a few days ago). The base version that Red Hat used is 3.4.0, from 2008. They've added various patches to it (hence the "-1" in the version), but didn't update the date. Even on the current version the date is in 2010; having a date really isn't all that useful, but we're kind of stuck with it. The version numbers are a little bit useful; on a distro kernel, they at least specify which base version of bonding was used, but it's usually necessary to go look at the source code anyway, because the version skews between distros (so a "3.4.0-1" on distro A isn't the same as "3.4.0-1" on distro B), and sometimes patches are added without the version number being changed or patches are taken piecemeal and so on.
Thanks for the detailed reply.
I wondered if looking at how the *BSDs handle this might help? Please don't think I'm trying to teach you to suck eggs.
FreeBSD 7.3 on the same hardware with the same network config (port 0 on a non-LACP port and port 1 correctly on an LACP port) uses the MAC address of port 0 for both interfaces, but still works correctly:
# grep ^em /var/run/dmesg.boot
em0: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0xec00-0xec1f mem 0xdefe0000-0xdeffffff,0xdefc0000-0xdefdffff irq 33 at device 0.0 on pci8
em0: Using MSI interrupt
em0:
em0: Ethernet address: 00:1f:29:61:b1:bc
em1: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0xe880-0xe89f mem 0xdef80000-0xdef9ffff,0xdef60000-0xdef7ffff irq 33 at device 0.1 on pci8
em1: Using MSI interrupt
em1:
em1: Ethernet address: 00:1f:29:61:b1:bd
# ifconfig em0
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1f:29:61:b1:bc
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
lagg: laggdev lagg0
# ifconfig em1
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1f:29:61:b1:bc
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
lagg: laggdev lagg0
# ifconfig lagg0
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1f:29:61:b1:bc
inet 146.x.x.70 netmask 0xffffff80 broadcast 146.x.x.127
media: Ethernet autoselect
status: active
laggproto lacp
laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em0 flags=18<COLLECTING,DISTRIBUTING>
Their latest stable code is here:
http://svn.freebsd.org/viewvc/base/stable/8/sys/net/if_lagg.c?view=markup&pathrev=216730
Perhaps the authors of that could shed some light on how they tackled it?
Thanks again for your help and for all the good work you've done on this. I hope you nail this issue.
Cheers.
The 802.3ad standard (802.1AX nowadays) requires that each aggregator have a unique MAC address. Linux doesn't do this, so when the slaves are split across two aggregators (e.g., if some are connected to one switch and some to another, where the switches are connected) the same MAC address may be sent as the source MAC on both aggregators. This confuses the switch.
Your case is similar; the bond gets its MAC from slave A (in your case, the "non-LACP port" slave). It assigns that MAC to all slaves. The active aggregator ends up being on slave B, using slave A's MAC for the aggregator. LACPDUs are periodically sent on slave A, also using its permanent MAC address (LACPDUs use the interface's permanent MAC, not the MAC used for the aggregator). When the LACPDU goes out, the switch updates its mac address table, and sends traffic destined for that MAC to slave A, which is dropped (non-control traffic inbound to inactive slaves is dropped to suppress duplicates). Once the active aggregator sends something (on slave B), the switch mac address table updates again, and voila, everything works again.
You can induce the behavior on a single switch if the ports form into multiple aggregators, either because they're grouped separately on the switch or because they're different speeds.
FreeBSD might be running both aggregators as active simultaneously, it accept traffic inbound to the non-active aggregator, it might not send LACPDUs on the non-active aggregator, it might be using a separate MAC address under the covers and not showing it in ifconfig, or it might be something else.
The solution for linux is to have each aggregator select a MAC for itself from one of its slaves.
This is tricky for older network cards, because some drivers do not permit altering the MAC address while the device is up. The recent (last three or four years, I'd say) drivers can all do this, so it's really an issue only for older devices.