Problems with ALB mode and VLANs

Help
Anduras AG
2008-10-21
2013-06-06
  • Anduras AG
    Anduras AG
    2008-10-21

    Hello!

    We have problems using Bonding and VLAN with ALB-Mode:

    Shortly after initiating "iperf" traffic from each of the client
    to each other client, all traffic collapses to one line. This is
    expected behaviour of the ALB algorithmus, but we did not expect
    this traffic to stay there!

    We would expect the traffic to be rebalanced between the bonding
    slaves, but it never happens.

    We analysed it down to the ARP replies the bonding interface should
    receive.
    After an ARP request was sent (by the router) in the "rlb_arp_xmit()",
    the "rlb_choose_channel()" function is called and reserves a
    (placeholder) entry in the hash table. According to the comment this
    entry will be updated later by the corresponding ARP reply.
    Shortly afterwards we see an ARP reply package on the VLAN interface.
    But it is never processed by "rlb_arp_recv()", because this function
    only receives arp packages from bond0 and so the entry is never
    updated.

    Now the question:
    Is this an error in the bonding driver or are we doing something
    wrong? Do we need to set some special /proc/sys/... values or
    have to configure some driver special?

    (And yes we want ALB, not LACP or anything other...!)

    Here are some details about our network:
                                                                +-----------+
                                                          +---+ Client 1 |
    +----------+           +----------+------+    +-----------+     +-----------+
    |              +--------+             +-----------------------------+  Client 2 |
    |  Router  +--------+  Switch |                                     +-----------+
    |              +--------+              |                                     +-----------+
    |              +--------+             +-----------------------------+  Client 3 |
    +----------+           +----------+------+    +-----------+     +-----------+
                                                          +---+ Client 4 |
                                                                +-----------+

    Router:

      eth0
      eth1
      eth2
      eth3
      bond0    (Slaves: eth0, eth1, eth2, eth3)
      bond0.11  (VLAN 11) IP: 10.0.1.250/24
      bond0.12  (VLAN 12) IP: 10.0.2.250/24
      bond0.13  (VLAN 13) IP: 10.0.3.250/24
      bond0.14  (VLAN 14) IP: 10.0.4.250/24

      Routing for each subnet is enabled.

    Switch (HP ProCurve):
      4 wires to router (one for each physical interface, eth0-3)
            each VLAN tagged (with 11,12,13,14)
      1 wire to each client, untagged VLANs
     
    Client1:
      Network 10.0.1.1/24 GW: 10.0.1.250/24

    Client2:
      Network 10.0.2.1/24 GW: 10.0.2.250/24

    Client 3:
      ...

    Software/Hardware:
    Linux: 2.6.26.2
      Network driver: e1000e (0.4.1.7-NAPI)
    Switch: HP ProCurve 5406zl

    Bonding configuration:

    Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)
    Bonding Mode: adaptive load balancing
    Primary Slave: None
    Currently Active Slave: eth11
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 100
    Down Delay (ms): 100