We are experiencing some problem using UnboundIDLDAP SDK.
Problem
We are using the unboundid library for our client in a deployment where we have several ldap server deployed in a kubernetes environment.
In general our server are always in a similar state relative to the amount of traffic they receive. In summary, when one of them is overloaded is because every ldap server is overloaded.
We are experiencing a problem in our clients because of the way the unbounded (version 6.0.4) is handling the connections.
In particular we are using the pool "LDAPConnectionPool", the "processRequestsAsyncin" method which is inherited from "AbstractConnectionPool". The issue here is that this method has a final part which decides if the connection is still usable based on the result code:
...
if (! ResultCode.isConnectionUsable(result.getResultCode()))
{
isDefunct = true;
}
..
Based on that decision, the connection is "removed" from future usage if result code are some specific one (e.g. 51). Also this could have sense in some situations (when one ldap server is behaving incorrectly) it could be quite problematic in other situations, like the one we have.
Our servers, which are all of them busy at an specific point, are all of them returning 51 (busy). The point is that according to previous code, the connections to the server(s) is closed, but as we continue sending traffic to the client, the connection is reopened again, just to receive a 51 to be closed and reopen again.
This cycle repeats for ever (as far as we are sending 51 from all servers), resulting in a much more problematic situation: the client start to consume a lot of cpu (because of continuously closing and reopening the connection) and the servers get extra overload due to this connection reopening.
Solution
We consider that although the idea of closing connections could be good in some situations (an isolated problematic ldap server) it could cause even a more problematic issue in other situations (like the one described).
Therefore, we consider that it could be nice if the LDAPConnectionPool could accept as configuration a list of ResultCodes for which this particular connection defunction must apply (By default if not list is provided just use existing behaviour as this is a general purpose sdk).
I think that the default behavior is a good one. I do agree that if the connection pool is only set up to use a single server, and if that server is overloaded and returning BUSY in response to every request, then there’s not much benefit to dropping and re-establishing the connection. However, if you’ve got a pool that can use multiple servers, and if you have a health check that operates on newly-established connections and won’t try to use a server that’s returning BUSY results, then it’s definitely a good thing to have the connection pool prefer servers that aren’t returning BUSY results.
Nevertheless, I also think that it could be useful to make it possible to customize the result codes that the LDAP SDK might use to determine whether a connection should still be usable. It’s still going to use the same set of result codes by default, but you can now use the ResultCode.setConnectionNotUsableResultCodes method to define an alternative set if desired. It’ll be included in the next release of the LDAP SDK (which doesn’t yet have a definitive release data but will probably be out by the end of June), but you can get it now if you check out and build the LDAP SDK for yourself.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
We are experiencing some problem using UnboundIDLDAP SDK.
Problem
We are using the unboundid library for our client in a deployment where we have several ldap server deployed in a kubernetes environment.
In general our server are always in a similar state relative to the amount of traffic they receive. In summary, when one of them is overloaded is because every ldap server is overloaded.
We are experiencing a problem in our clients because of the way the unbounded (version 6.0.4) is handling the connections.
In particular we are using the pool "LDAPConnectionPool", the "processRequestsAsyncin" method which is inherited from "AbstractConnectionPool". The issue here is that this method has a final part which decides if the connection is still usable based on the result code:
...
if (! ResultCode.isConnectionUsable(result.getResultCode()))
{
isDefunct = true;
}
..
Based on that decision, the connection is "removed" from future usage if result code are some specific one (e.g. 51). Also this could have sense in some situations (when one ldap server is behaving incorrectly) it could be quite problematic in other situations, like the one we have.
Our servers, which are all of them busy at an specific point, are all of them returning 51 (busy). The point is that according to previous code, the connections to the server(s) is closed, but as we continue sending traffic to the client, the connection is reopened again, just to receive a 51 to be closed and reopen again.
This cycle repeats for ever (as far as we are sending 51 from all servers), resulting in a much more problematic situation: the client start to consume a lot of cpu (because of continuously closing and reopening the connection) and the servers get extra overload due to this connection reopening.
Solution
We consider that although the idea of closing connections could be good in some situations (an isolated problematic ldap server) it could cause even a more problematic issue in other situations (like the one described).
Therefore, we consider that it could be nice if the LDAPConnectionPool could accept as configuration a list of ResultCodes for which this particular connection defunction must apply (By default if not list is provided just use existing behaviour as this is a general purpose sdk).
I think that the default behavior is a good one. I do agree that if the connection pool is only set up to use a single server, and if that server is overloaded and returning BUSY in response to every request, then there’s not much benefit to dropping and re-establishing the connection. However, if you’ve got a pool that can use multiple servers, and if you have a health check that operates on newly-established connections and won’t try to use a server that’s returning BUSY results, then it’s definitely a good thing to have the connection pool prefer servers that aren’t returning BUSY results.
Nevertheless, I also think that it could be useful to make it possible to customize the result codes that the LDAP SDK might use to determine whether a connection should still be usable. It’s still going to use the same set of result codes by default, but you can now use the ResultCode.setConnectionNotUsableResultCodes method to define an alternative set if desired. It’ll be included in the next release of the LDAP SDK (which doesn’t yet have a definitive release data but will probably be out by the end of June), but you can get it now if you check out and build the LDAP SDK for yourself.