From: Chris Friesen <cfriesen@no...> - 2008-07-25 05:10:15
We've recently run into an interesting chicken-and-egg problem with
bonding, arp monitoring, and DHCP.
We have a blade-based system with a pair of disk server blades, a pair
of network switch blades, and a bunch of app blades.
The disk server blades act as DHCP servers to all the other blades. The
switch blades boot from flash, but then obtain their IP address and
other config info from the server blades via DHCP a bonded link.
We would like to use something other than simple carrier sense because
the firmware on the switch cards has the nasty habit of bringing up
carrier way before the switches are actually ready to handle traffic.
We've run into the following scenario:
1) server blade is up, switch blades are down
2) switch blades start to boot, carrier comes up (detected on server)
3) switch blades issue DHCP request
4) server blade attempts to reply to request, but has no active link
because arp monitoring hasn't received a reply yet
5) several hundred ms later, arp monitoring notices we received a packet
(the DHCP request) and brings the link up
6) several hundred ms after that, arp monitoring notices we haven't
received any arp responses, and brings the link down
7) several hundred ms after this, the switch blade issues another DHCP
request (jump to step 4)
There are other sources of packets on the system, and eventually the
timing is such that the DHCP request arrives during the window that the
link is up, and the system comes up.
I've been asked to consider a hack to attempt sending a packet out
any/all (not sure yet) links with carrier signal if we've failed to find
a suitable active link. I suppose we could also set the DHCP retry
interval to be smaller than the bonding arp interval.
Both of these options seem fairly hackish, so can anyone suggest a
better way to handle the above scenario?
From: Nicolas de Pesloüan <nicolas.debian@fr...> - 2008-07-25 21:37:14
Chris Friesen wrote :
> We've recently run into an interesting chicken-and-egg problem with
> bonding, arp monitoring, and DHCP.
> We would like to use something other than simple carrier sense because
> the firmware on the switch cards has the nasty habit of bringing up
> carrier way before the switches are actually ready to handle traffic.
The chicken-and-eff problem definitely come from the design of your
environment: because of arp monitoring, you try to talk to a switch to
do it initial setup using a link whose state cannot be asserted before
the switch is up...
I suggest you stick to link state monitoring instead of arp monitoring.
Why having the link-state-up arriving too early at the server side is a
Using active-backup mode, if you don't force a primary slave, a switch
restart while the other switch is up and running won't cause any
trouble, because the active slave won't change.
Using 802.3ad mode, isn't the bonding driver supposed not to start using
the link until it successfully negotiate with the switch at the other
side of the wire ?
Using other modes, you can probably tune updelay to a value just below
the time between link-up and the first DHCP request, assuming that the
switch enter forwarding state quickly after receiving the DHCP response.
'hope this help.