Hello,
I have small problem with bounding driver from linux 2.6.x and keepalived (=
VRRP=20
client). Keepalived is able to track network down events and in such cases =
it=20
does not switch to MASTER state.
I have four core switches and two routers in different rooms, each with=20
two stacked switches. Two routers use keepalived for VRRP so if one of=20
them goes down second one takes all traffic. Today, we had a switch=20
failure. No problem at all - switches are dupliacted and most server are=20
connected to both ones, using some kind of link aggregation. But it was=20
required to poweroff second switch to replace the failed one (they are=20
stacked). Unfortunately, when all slaves goes down Linux bounding device=20
is still market as RUNNING. Kernel logged:
tg3: eth1: Link is down.
bonding: bond0: link status definitely down for interface eth1, disabling i=
t
bonding: bond0: making interface eth0 the new active one.
tg3: eth0: Link is down.
bonding: bond0: link status definitely down for interface eth0, disabling i=
t
bonding: bond0: now running without any active interface !
and keepalived deamon switched to MASTER STATE because it was no longer abl=
e to=20
receive multicast packets from master:
Keepalived_vrrp: VRRP_Instance(VLAN_1) Transition to MASTER STATE
Keepalived_vrrp: VRRP_Instance(VLAN_1) Entering MASTER STATE
Keepalived_vrrp: VRRP_Instance(VLAN_1) setting protocol VIPs.
Keepalived_vrrp: VRRP_Instance(VLAN_1) Sending gratuitous ARP on vlan1
Then, when I powered again this stack, both routers were in MASTER state.=
=20
Unfortunately router connected to the problematic stack has lower IP adress=
, so=20
previously active router switched to SLAVE state. So, all conntrack & nat d=
ata=20
was longer available.
So, is it possible to fix bounding driver to drop RUNNING flag when all=20
slaves are down? Anyway, is it OK that in such case kernel produces=20
thousands of "bonding: Error: found a client with no channel in the=20
client's hash table" messages?
Best regards,
=09=09=09Krzysztof Ol=EAdzki
|