Sowmya - 2007-01-17

Hi,
Following is the bonding setup we are trying:

1. Multiple CPU blades connected to two redundant smart switches (This switch has both Layer 2 and Layer 3 capability).
2. Each smart switch is again connected to Layer2 redundant switches.
3. The Layer 2 switches connect to Layer 3 redundant switches.

We have bonding on eth0 and eth1 in the CPU blades ARP monitoring the external Layer 2 switches. The CPU blades have multiple VLANs (say 102 to 108) tagged on top of bonding. When we set the arp_ip_target to say VLAN 102 L2 switch IP (192.168.102.2 and 192.168.102.3) (modeprobe bonding mode=1 arp_interval=1000 arp_validate=1 arp_ip_target=192.168.102.3,192.168.102.3),
a. the bonding bounces back and forth.
b. We see ARP requests going out from the active slave ethernet interface eth0 for each of the arp_ip_targets that was monitored. This is as expected.

> tcpdump -e -i eth0

11:57:16.181475 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 192.168.102.2 te
ll 192.168.102.109
11:57:16.181483 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 192.168.102.3 te
ll 192.168.102.109

We see unicast ARP replies for L2 switch IPs on the bond0.102 interface. This is also fine.
> tcpdump -e -i bond0.102

11:58:33.194903 00:00:cd:27:bd:bc (oui Unknown) > 00:03:ba:af:17:43 (oui Unknown), ethertype ARP (0x0806), length 56: arp re
ply 192.168.102.3 is-at 00:00:cd:27:bd:bc (oui Unknown)
11:58:33.199748 00:00:cd:27:bd:ba (oui Unknown) > 00:03:ba:af:17:43 (oui Unknown), ethertype ARP (0x0806), length 56: arp re
ply 192.168.102.2 is-at 00:00:cd:27:bd:ba (oui Unknown)

But we also see the following packets on eth0

> tcpdump -e -i eth0

11:57:19.181281 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.88.1.109 is-at
00:03:ba:af:17:43 (oui Unknown)
11:57:19.181286 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 192.168.102.109 is
-at 00:03:ba:af:17:43 (oui Unknown)
11:57:19.181290 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 192.168.103.109 is
-at 00:03:ba:af:17:43 (oui Unknown)
11:57:19.181292 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 192.168.104.109 is
-at 00:03:ba:af:17:43 (oui Unknown)
11:57:19.181295 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 192.168.105.109 is
-at 00:03:ba:af:17:43 (oui Unknown)
11:57:19.181298 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 192.168.106.109 is
-at 00:03:ba:af:17:43 (oui Unknown)
11:57:19.181301 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 192.168.107.109 is
-at 00:03:ba:af:17:43 (oui Unknown)
11:57:19.181304 00:03:ba:af:17:43 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: arp reply 192.168.108.109 is
-at 00:03:ba:af:17:43 (oui Unknown)

Looks like there are ARP reply broadcasts for each of the bonded VLAN IPs going to the ethernet interface. Is this a bug in the bonding module?

The bonding driver we use is - Ethernet Channel Bonding Driver: v3.1.1 (September 26, 2006). Please note that we have set the arp_validate to 1

FYI, We are using the ubuntu 2.6.19.1 kernel

thanks
Sowmya