From: Simon Kirby <sim@ne...> - 2005-01-13 19:26:21
Just recently we saw a case where a pair of keepalived LVS boxes were
saturated by a DoS that was over 100 Mbps. The boxes began to flap back
and forth because the links were saturated and they were receiving
a reduced number of VRRP packets due to the saturation.
This is normally fine -- they will flap back and forth a little bit, the
DoS will subside, and everything will continue (although perhaps with
some existing sessions broken).
However, what happened here is that on the eventual final flap, the DoS
saturated the link enough for the gratuitous ARPs to be dropped. Some
backend servers were left sitting with the other box as the gateway, so
they were trying to reply asymmetrically. Because of the LVS state
tables, all of the traffic was dropped. Also, because the gateway was
still considered "reachable" by the Linux neighbour code, it wasn't
I presume the occurrence of this situation is the reasoning behind
sending multiple (duplicate) gratuitous ARPs. I suppose the only real
way to improve this situation (apart from upgrading the links to gigabit)
is to use a timer to retransmit a few times instead of all at once.
Would this be difficult to implement? Does anybody have any better