Anduras AG - 2008-10-21

Hello!

We have problems using Bonding and VLAN with ALB-Mode:

Shortly after initiating "iperf" traffic from each of the client
to each other client, all traffic collapses to one line. This is
expected behaviour of the ALB algorithmus, but we did not expect
this traffic to stay there!

We would expect the traffic to be rebalanced between the bonding
slaves, but it never happens.

We analysed it down to the ARP replies the bonding interface should
receive.
After an ARP request was sent (by the router) in the "rlb_arp_xmit()",
the "rlb_choose_channel()" function is called and reserves a
(placeholder) entry in the hash table. According to the comment this
entry will be updated later by the corresponding ARP reply.
Shortly afterwards we see an ARP reply package on the VLAN interface.
But it is never processed by "rlb_arp_recv()", because this function
only receives arp packages from bond0 and so the entry is never
updated.

Now the question:
Is this an error in the bonding driver or are we doing something
wrong? Do we need to set some special /proc/sys/... values or
have to configure some driver special?

(And yes we want ALB, not LACP or anything other...!)

Here are some details about our network:
                                                            +-----------+
                                                      +---+ Client 1 |
+----------+           +----------+------+    +-----------+     +-----------+
|              +--------+             +-----------------------------+  Client 2 |
|  Router  +--------+  Switch |                                     +-----------+
|              +--------+              |                                     +-----------+
|              +--------+             +-----------------------------+  Client 3 |
+----------+           +----------+------+    +-----------+     +-----------+
                                                      +---+ Client 4 |
                                                            +-----------+

Router:

  eth0
  eth1
  eth2
  eth3
  bond0    (Slaves: eth0, eth1, eth2, eth3)
  bond0.11  (VLAN 11) IP: 10.0.1.250/24
  bond0.12  (VLAN 12) IP: 10.0.2.250/24
  bond0.13  (VLAN 13) IP: 10.0.3.250/24
  bond0.14  (VLAN 14) IP: 10.0.4.250/24

  Routing for each subnet is enabled.

Switch (HP ProCurve):
  4 wires to router (one for each physical interface, eth0-3)
        each VLAN tagged (with 11,12,13,14)
  1 wire to each client, untagged VLANs
 
Client1:
  Network 10.0.1.1/24 GW: 10.0.1.250/24

Client2:
  Network 10.0.2.1/24 GW: 10.0.2.250/24

Client 3:
  ...

Software/Hardware:
Linux: 2.6.26.2
  Network driver: e1000e (0.4.1.7-NAPI)
Switch: HP ProCurve 5406zl

Bonding configuration:

Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)
Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: eth11
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100