Re: [Sqlgrey-users] Load balancing question, part 2

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Apr 26, 2010 at 11:38:34AM -0700, Gary Smith wrote:
> > I have setup two sqlgrey servers load balanced with ipvsadm.  Load balancing
> > is operating but I end up with a lot of orphaned ESTABLISHED connections on
> > the real servers.  In a period of 48 hours, I received ~500 requests (per real
> > server) and here were about ~250 established connections per server.
> > 
> > When I bypass ipvsadm and just go direct to a single server, I see only a few
> > connections established (and there is a corresponding connection on the
> > postfix side).
> > 
> > Does anyone else on the list run sqlgrey in an ipvsadm load balanced scenario?
> > If so, any pointers?  Postfix seems to have no complaint on this, but I think
> > by design it reconnects when the connection is gone.
> 
> This might be helpful for people on the list.
> 
> Okay, isolating it to a single real server node in the load balanced cluster still causes the same result.  It appears that after N second, postfix hangs up on the connection but it's not realized by sqlgrey, probably because of the load balancer.  So it is then up to the OS TCP TTL settings to kill the TCP connection.  I have put a dirty hack in place just to test.  I have setup $mux->set_timeout in the mux_input and for the mux_timeout callback, I close the current filehandle ($fh).  I do this after the processing has taken place (immediately after the while loop).  It's probably the wrong thing to do, but the connection is closed after the timeout and even this closure is seen by postfix and the load balancer.
> 
> The concern here is that sqlgrey isn't reacting gracefully when connections are abandoned (that is closed, but never receiving notification).  It stands out when something like load balancing is put in place (my observation, I could still be wrong here).
> 
> It might be useful to put some type of sanity timeout check in place for a case like this.  If you have a reasonably configured default TTL for TCP at the OS level then the impact is probably minimal.
> 
> I have been using sqlgrey for some years now, and I am migrating it to a separate cluster (currently it lives on each MTA, but trying to break that as we have a resource need to move this to it's own cluster).
> 
> Thoughts?
> 

SQLgrey is doing the correct thing in this case. It does not know why
the connection is gone or even if it is gone for a while. The load balancer
should close the connection to the remote SQLgrey when the frontends go
away or depending on how it works, when all connections from the frontend
are closed. This will keep SQLgrey from holding old connections around until
they are reclaimed. It is useful to have timeouts such as you mention to handle
other bits of poorly designed software.

Cheers,
Ken