We're implementing a Lustre filesystem on Linux using some IBM x3850s that are populated with 3 Quad Port GigE NICs. So, we have these 12 NICs that we'd like to aggregate together into bonds to gain network performance. We're planning on aggregating 3 ports together with bonding mode 4 to achieve three times the output of a single port.
Sub Bonds (aggregated for increased throughput using LACP mode=4 and configuring our Cisco switches for LACP):
Bond0 = Ports 0 4 8 on Switch 1
Bond1 = Ports 1 5 9 on Switch 2
Bond2 = Ports 2 6 10 on Switch 1
Bond3 = Ports 3 7 11 on Switch 2
Super Bonds (for simple failover mode=1):
Bond4 = Bond0 and Bond1
Bond5 = Bond2 and Bond3
Before we go down this route and redo all our current VM dedicated channel bonds on the two servers, we wanted to confirm if it is technically possible to "nest" aggregated bonds together or if it creates issues somewhere and that we'd be better off just skipping nested bonds all together.
Has this been done before, does anyone have a process, thoughts on the complexity, or examples?
Cheers and gratitude,
You don't want to do this.
The main reason is that the failover on the "super bond" won't happen until all ports in the 802.3ad aggregator in the "sub bond" have failed. So, you could be down from three ports to one, and no failover will occur.
Second, for that level of availability, 802.3ad will do that all by itself; no "super bond" is needed; when all links in a aggregator fail, 802.3ad will select a new aggregator automatically. You can simply configure all of the ports from, say, bond0 and bond1 in your example (Bond0 = Ports 0 4 8 on Switch 1, Bond1 = Ports 1 5 9 on Switch 2) and 802.3ad will configure them (assuming the switches are set up properly) into two aggregators of three ports each, and then select one of those two for active use.
Lastly, and this might not help today, but recently new functionality was added to the 802.3ad mode to add other aggregator selection policies for just this sort of use. The new option permits 802.3ad to select a new aggregator any time any port is added, removed or changes link state, which has the net effect of keeping the "best" aggregator always in use (where "best" is either "most number of ports" or "highest aggregate bandwidth").
That new functionality should appear in the next revision of the mainline kernel, 2.6.29. You could get it from the net-next-2.6 git repository now, if you're feeling adventurous.
Oh, and you'll want to read up on the xmit_hash_policy option, depending on your workload, you may want to look into layer2+3 or layer3+4 hashing for traffic; it may produce a better balance of the load.
Great information. Does that mean that we should pursue the following configuration instead:
Bond0 = Ports 0 4 8 (Switch 1) + 1 5 9 (Switch 2)
Bond1 = Ports 2 6 10 (Switch 1) + 3 7 11 (Switch 2)
How would the preferred active aggregator get selected, is that on the Cisco switch or on our Linux machine?
Cheers and thanks for your earlier reply,
That's pretty much the configuration, yes.
The linux end will select one of the aggregators to be active, and the switches will go along with it.
If you've got snazzy enough switches, then a single aggregator can span multiple switches, permitting all ports to be active all the time (if memory serves, the Cisco 3750 will do this, probably other models as well).
Awesome, makes our configuration and admin life simpler. Thank you very much for the outstanding posts, we really appreciate it here in MN. :)
Cheers and many thanks,