UnboundID LDAP SDK for Java / Discussion / Discussions: API losing all connections to LDAP

Jorge Munoz - 2019-05-08

Hi there,

I'm having issues with the unboundid-ldapsdk library (version 4.0.9) in JAVA. It appears that when one of the directory server is no longer available (completely offline; server can't be reach via TCP) in the RoundRobinServerSet, the API ‘hangs’. This causes all existing connections in the API's connection pool to be closed due to the max life or idle connection. At this point no new connections are establish while the LDAP node is offline and will only recover once the LDAP node is back online. Why doesn't unboundid go the the next available server in the RoundRobinServerSet instead of just preventing any new connections. This will cause issues in a production environment that need to bring an LDAP node down for maintains.

Example configuration:
LDAPConnectionOptions ldapConnectionOptions = new LDAPConnectionOptions();
ldapConnectionOptions.setAllowConcurrentSocketFactoryUse(true);
ldapConnectionOptions.setConnectTimeoutMillis(180000L);

BindRequest bind = new SimpleBindRequest(bindDn, password);
ServerSet serverSet = new RoundRobinServerSet(hosts, ports, sslUtil.createSSLSocketFactory(), ldapConnectionOptions, bind, null);

LDAPConnectionPool connPool = new LDAPConnectionPool(serverSet, bind, initPoolSize, maxPoolSize);
connPool.setRetryFailedOperationsDueToInvalidConnections(true);
connPool.setHealthCheck(ldapConnectionPoolHealthCheck); //Implement own health check
connPool.setMaxConnectionAgeMillis(150000L);
connPool.setMinDisconnectIntervalMillis(5000L);
connPool.setCreateIfNecessary(true);
connPool.setMaxWaitTimeMillis(5000L);
connPool.setHealthCheckIntervalMillis(60000L);
connPool.setCheckConnectionAgeOnRelease(true);
connPool.setMinimumAvailableConnectionGoal(100);

Notes:

Health check only does a simple read. (self implemented ensureNewConnectionValid and ensureConnectionValidForContinuedUse)
Two connection pools are created (bind and api).
No failover server set is setup.

Any help would be appreciated. Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-08

The issue is a combination of the way that the round robin server set works and your connection options.

The round robin server set maintains a circular list of servers, and whenever it needs to establish a connection, it will try to connect to the next server in the list. If that fails, then it will go on to the next server, etc. until it has attempted all of the servers, at which point it will ultimately fail. It only tries one server at a time.

When trying to establish a connection, if packets that are sent out get no response whatsoever (for example, because the target machine is offline, there's a firewall blocking traffic, or there's some kind of networking problem), then the connection attempt will block until one of two things happens:

The LDAP SDK connect timeout is reached. The LDAP SDK's default connect timeout is ten seconds, but it looks like you're overriding that to be 180,000 seconds, or three minutes. That seems extremely high. The ten-second limit that the LDAP SDK uses by default is already pretty conservative; if it takes you longer than ten seconds to establish a connection to a server, then you probably don't want to use that server.

The operating system finally gives up. This depends on your OS configuration, but it's probably at least three minutes.

This means that if the next server in the round-robin server set is completely unreachable such that connection attempts go into a black hole, then that connection attempt will block for up to three minutes.

There are two things you can do to address this:

Change your connection options to use a much lower connect timeout, so that you're not waiting as long.

Consider using a different server set. If you want to have a longer timeout for a worst-case scenario but still want to be able to establish connections quickly under most normal conditions (and even the case in which one or more servers are completely offline), then you could consider something like the FastestConnectServerSet, which will try to establish connections in parallel to each of the configured servers and will end up using the first one that is established and discarding the others. If you have a lot of servers and don't want to try to connect to all of them for each connection attempt, then you could use a layered approach where you have a FailoverServerSet that contains multiple FastestConnectServerSet instances in it. In this case, the LDAP SDK will try to establish connections in parallel to the servers in the first FastestConnectServerSet, and will only fail over to another one with alternate servers if it can't establish a connection to any of the servers in the first set within the configured timeout period.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jorge Munoz - 2019-05-09

Thanks Neil for the quick response.

I tried your first suggestion of reducing the connection timeout and the result was the same. When one of the LDAP nodes goes offline, connections slowly die until there are none left. The API then recovers after the node that was offline comes back online.

I also tried your second suggestion of using a FastestConnectServerSet and adding two new connection configurations (abandonOnTimeout=true and responseTimeoutMillis=10000). This worked and solved the issues.

The overall goal should be a balanced pool across servers that are available. While understand FastestConnectServerSet seems like it just hides the underlying issue cause the down server will just never respond as fast as the still working ones. The cost is an unbalanced pool though.

Do i need to include abandonOnTimeout and responseTimeoutMillis to the connection configuration of my initial unboundId configuration that use the RoundRobinServerSet to prevent losing all connections?

Last edit: Jorge Munoz 2019-05-09

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-09

If a connection is already established, then the connect timeout won't do anything for it; that only applies when you're trying to create a new connection, and the connect timeout only covers the process of actually attempting to establish the connection and not anything you do on the connection after that point. For that, there are a couple of other things that can come into play.

The first is the response timeout, which controls how long the LDAP SDK will wait for a response of some kind to a request that you send. You've already mentioned this, but it's the best defense that you have against getting blocked for too long when a server has stopped responding or becomes unreachable. It should be set high enough that you won't prematurely interrupt a legitimate operation, but short enough that you won't be blocked for too long if a problem does arise. The abandonOnTimeout option indicates whether the LDAP SDK should send an abandon request to the server if a timeout occurs. This is a way to indicate that the client is no longer interested in getting a response to the associated operation, but it really doesn't matter to the client one way or another. If you have a reasonable timeout set for your environment, then it's honestly probably better to not use it, because if your connection happens to be blocked from sending data (which will likely only be the case if you've tried to send so many requests over the same unresponsive connection that you've filled up the socket's send buffer), then the attempt to send the abandon request could block, too. Also note that in a related note, the upcoming 4.0.11 release (which will probably be out in a few weeks) will have improved handling for connections that get blocked while trying to send data.

Note that while the LDAP SDK used to have a single response timeout for all types of operations, recent versions (I think since about 4.0.4) let you control the default timeout on a per-operation-type basis, so you can have different default timeouts for different types of operations. And you can also set the timeout on a per-request basis using the LDAPRequest.setResponseTimeoutMillis method.

The second component is the health checking that you've configured for the pool. Health checks can be invoked at different times in the life of a connection, including:

When a new connection is established

When a connection is being checked out of the pool

When a connection is being released back to the pool

If an exception is encountered while processing an operation against the pool (note that this only covers cases where you process operations directly against the pool rather than by checking out a connection and using it to process operations; for example, connectionPool.search instead of connectionPool.getConnection + connection.search + connectionPool.releaseConnection).

At regular intervals in the background (as controlled by the health check interval). Note that this one only applies to connections that aren't in active use at the time of the background health check.

It's these last two that are probably of the most importance in helping out in cases like this. We recommend processing operations against the pool directly rather than checking them out and processing operations on them yourself, because that allows the pool to do a better job of figuring out whether the connection is still valid, lets you take advantage of other features like automatic retry, and is also more convenient and easier to avoid leaking connections. If you also configure a get entry health check with a relatively short timeout to run in those two conditions, then it could help minimize the length of time when connections run into problems as a result of a completely unresponsive server.

One other thing that you could do would be to try to proactively close existing connections to a server if you think that server might not be available anymore. To do that, you could try building your own health check and use the LDAPConnectionPool.getConnection(host, port) method to try to get all of the connections to a specified server, and then release them all as defunct, which will cause the pool to replace them with the configured server set.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jorge Munoz - 2019-05-14

Hi Neil,

Thanks for all the valuable information. We were able to determine what the issue was.

Configurations (example):
LDAPConnectionOptions ldapConnectionOptions = new LDAPConnectionOptions();
ldapConnectionOptions.setAllowConcurrentSocketFactoryUse(true);
ldapConnectionOptions.setConnectTimeoutMillis(10000);

ServerSet serverSet = new RoundRobinServerSet(hosts, ports, sslUtil.createSSLSocketFactory(), ldapConnectionOptions, bind, null);

connPool = new LDAPConnectionPool(serverSet, bind, initPoolSize, maxPoolSize);
connPool.setRetryFailedOperationsDueToInvalidConnections(true);
connPool.setHealthCheck(ldapConnectionPoolHealthCheck);
connPool.setMaxConnectionAgeMillis(150000);
connPool.setMinDisconnectIntervalMillis(5000);
connPool.setCreateIfNecessary(true);
connPool.setMaxWaitTimeMillis(5000);
connPool.setHealthCheckIntervalMillis(60000);
connPool.setCheckConnectionAgeOnRelease(true);
connPool.setMinimumAvailableConnectionGoal(100);

The above configuration using RoundRobinServerSet did not work well with LDAP instances that closed idle connections after 45 seconds. Configuring the LDAP nodes to close idle connections after a 1.5 minutes, gave the API time to manage its own connections. For some reason, most of the connections were being closed by the LDAP nodes because they were idle and the API couldn’t recover until the node that was offline came back online. It seemed that since we had 100 connection available (goal) and there was not enough traffic to use all the connections, they were being closed by the LDAP instances. At this point, when the health check ran, all the connections were already closed and no new connections could be establish.

Now that the API is allowed to manage its own connection everything is working as expected.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-14

90 seconds (to say nothing of 45 seconds) seems like an extremely short length of time to allow a connection to remain idle, especially when you're using such a large number of connections. Setting up a TCP connection is substantially more expensive than keeping an already-established connection active, and that's magnified when you bring TLS negotiation into to the mix. Unless there's some really compelling reason for such a short timeout, I would strongly recommend something much longer than that. At an absolute minimum, I would recommend making it greater than twice the connection pool's health check interval (and probably a good buffer on top of that, given that the pool needs to work its way through the available connections in a single-threaded manner). Even if you've got an absolutely terrible application that leaks connections like a sieve, it's hard to imagine that it would be so bad that it would necessitate such a short timeout.

Do you happen to know if the directory server that you're using supports the notice of disconnection unsolicited notification in cases where it's closing a connection? If it does, then the LDAP SDK should pick up on that right away and replace the connection as soon as it's closed. However, if the server isn't sending that, then the client might not be able to detect the closure until the next time it tries to use the connection.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Rossiter - 2019-05-15

Hi Neil,

Thank you for your swift responses to our inquiries.

I’m working with Jorge on the issue we’re having/had and manage the data layer of the application stack in question here.

In response to your question about a notification when the directory closes idle connections; I did some investigation and there are no standards, X.500 or LDAP, that requires the directory server to send the client a notification when idle connections are closed by the directory. (The client may have just left the connection dangling.) I also verified this with the directory vendor.

Testing Summary:
We took note of your recommendation to raise the directory idle timeouts and set it at 3 minutes (at least for testing purposes). There is a historical reason why it is set so ‘low’ and I will not go into that here, but because now there is a health check and maximum connect age configured within UnboundID, I feel the idle time can be raised accordingly to a value that makes sense. Our testing was positive for the most part after raising the directory idle timeout in that no connections were lost between the application and the directory. Now we are in a fine tuning phase, I’d say and would like to understand UnboundID connection management in relation to the roundrobinset better.

Description of setup:
We have 4 directory servers, which are part of the roundrobinserverset that is defined in UnboundID. The directory servers are dedicated to the application which uses UnboundID and Repose is also in the mix as part of the application stack. These directory servers have ‘router’ DSAs to which UnboundID connects and they in turn are connected to all data DSAs on all 4 servers. So, essentially, UnboundID only knows these router DSAs.

Hi level testing actions:
We bleed out client connections to the router DSA on the server we want to take out of service, which seems to go well; however, there are still connections alive on the data DSA on that server from the other router DSAs . After this we notice some sluggisheness for about 2 minutes from the time the server is shutdown, which can be attributed to the other directory servers picking up the slack at the moment the one directory server is taken down. This can be seen in the response logs of the directory servers. Also some 503s, which could be attributed to the connections between the remaining router DSAs and the data DSA on the server that was taken out of service (i.e. rebooted). I think we are okay up to this point and I can probably do something to ease the pain of the 503s, but I’m not sure it all hangs on the directory servers at this time. Still investigating.

Better Understanding:
What would help us though is a better understanding of how the roundrobinserverset works when a directory server is up and available, but the DSA on the server is not, and also how it behaves when the directory server is unavailable (i.e. down or network unreachable).
From the description of the roundrobinserverset it is clear that when a server is encountered that is unavailable (i.e. the connection timeout is reached), the next server in the set is tried. So, when a directory server is available, but the DSA is down, and UnboundID gets a connection rejection is it true that UnboundID will still try to connect to the server whenever a connection is required and that server is in turn (according to the roundrobinserverset)? Likewise, if the server is down or network unavailable, will UnboundID still make connection attempts?
From some debug logging, we have seen that there is a retry logic that seems to make multiple attempts to connect to a server if a timeout occurs (I think) and possibly causes some delay (or blocking) in processing connections (i.e. the duration of connection timeout setting). Likewise, this seems to happen when the server is totally unavailable.
If it doesn’t already exist, maybe it might be good if there were logic to control when the next time a connection should be attempted to a non-responding/negative responding server (i.e. a configurable timer for every minute, every 5 minutes, etc.) In this way, attempts would only be tried when that interval has elapsed, which hopefully would speed up connection attempts, for example. This of course depending on if my assumption is correct that no such setting exists.
If we know that a server will be out for a longer period, we can always remove it from the roundrobinset.

One of our major tests will be to take a directory server down for a weekend and see what happens.
There was an incident last year where a directory server went off line during a weekend due to a bad drive, and eventually all connections were aborted between UnboundID and all the directory servers; seemingly because something stalled at the client side (i.e. UnboundID). The directory servers were still taking in connections otherwise.

Thanks very much for any additional info you can supply.
Robert

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-16

Thanks for your reply.

It’s certainly the case that an LDAP directory server isn’t required to notify the client before it closes the connection. However, RFC 4511 section 4.4.1 does define a “Notice of Disconnection” unsolicited notification that a directory server can send to a client to notify it that the connection is about to be closed. This is an optional part of the protocol, so a server doesn’t have to use it, but it can be nice in cases like an idle timeout, or cases where a server is shutting down, to help make the loss of the connection more visible. The LDAP SDK does look for this notification, and if it’s received on a pooled connection, then it can automatically replace that no-longer-valid connection with a newly created, often in a way that is completely invisible to the client.

To answer your questions about the round-robin set, let me first say that the server set only comes into play when establishing a new connection. If a connection pool has existing connections available that it believes are valid, then it will always try to use one of those first, and the LDAP SDK considers all connections in a pool to be equally suitable for use. But the pool will use the server set to create a new connection under a variety of circumstances, including:

If it needs a connection but none are immediately available and fewer than the maximum number of connections have been created in the pool.

If it needs a connection but none are immediately available and the pool is configured with createIfNecessary=true (optionally after waiting for up to maxWaitTimeMillis)

If you use the pool to process an operation, and that operation fails in a way that suggests that the connection is no longer valid. In that case, the connection will be closed and a new connection created to take its place. Depending on the configuration of the pool, it may either automatically retry the operation on the newly created connection, or it may simply pass the failure back to the client.

If you check out a connection and release it back to the pool as defunct, then the connection will be closed and a new one will be created to take its place.

There are several types of server set implementations, including:

Single Server -- Only knows about one server and will always try to connect to that one

Round-Robin -- Maintains a circular list and each new connection attempt will go to the next server in the list

Fewest Connections -- Keeps track of the number of connections to each server in the list and chooses the server with the fewest connections

Fastest Connect -- Tries to connect to multiple servers in parallel and goes with the one that can be established first

Failover -- Maintains an ordered list of servers or server sets, and will always try them in the order that they are defined (that is, it will only try the second server/set if the attempt to get a connection to the first fails)

DNS SRV Record -- Uses DNS service records to discover the set of available servers and picks one using weights defined in those service records.

Round-Robin DNS -- Uses DNS to get all IP addresses associated with a specified name, and then does a round-robin across those addresses.

Each of them exhibits different behavior, but for most of them (excluding the single server set, which only tries one server, and the fastest connect set, which tries all of them concurrently), the basic premise is that the server set will determine the order in which to try the servers, and then work its way down the list until it establishes a suitable connection or until it runs out of servers. The process of getting a suitable connection involves the following steps:

Establish the TCP connection.

If the connection should use SSL to secure communication, then initiate that negotiation.

Invoke the health check’s ensureNewConnectionValid method to ensure that the new connection is acceptable.

Perform any pre-authentication post-connect processing. For example, this may include using the StartTLS extended operation to convert an insecure connection to a secure one.

Authenticate the connection.

Invoke the health check’s ensureConnectionValidAfterAuthentication method to ensure that the authenticated connection is acceptable.

Invoke any post-authentication post-connect processing.

The first three of these steps are invoked by the server set, while the rest are done by the connection pool.

I’m not exactly sure what you mean by “a directory server is up and available, but the DSA on the server is not”, but I assume that you mean one of the following:

The system is up and reachable but the LDAP server isn’t running. In this case, then the attempt to establish a TCP connection should fail immediately with a “connection refused” error.

The system is up and the LDAP server is accepting connections but isn’t in a state suitable for processing requests (e.g., a portion of the DIT isn’t available). In this case, the TCP connection should get established right away, and presumably any appropriate TLS negotiation will succeed. Depending on the nature of the LDAP server state, then it may be that the authentication attempt would fail, but if that’s not the case, then your best bet would be to have a health check that verifies that the server actually appears to be usable. For example, the LDAP SDK offers the GetEntryLDAPConnectionPoolHealthCheck class that can make sure that it’s possible to retrieve a specified entry in a timely manner.

If the entire system is down, or if there’s a firewall or networking issue that prevents packets from traveling between the client and the directory server, then an attempt to connect to that system will block until the connect timeout is encountered or the underlying OS gives up on the connection attempt.

At present, none of the server set implementations offer any kind of blacklisting mechanism that will allow them to remember anything about previous failures. I can consider updating them to provide this functionality, whereby if an attempt to establish a connection to a server fails, then the server set could temporarily take that server out of the set for a while until it becomes available again. I would probably actually start with the fewest connections server set rather than the round-robin set, though, because I think that the fewest connections set is generally better than the round-robin set in just about every case when all the servers are otherwise considered equal, with the one obvious exception being that one of the servers is unavailable (because it would naturally not have any connections to it). Of course, you’re also free to write your own server set implementation that uses whatever logic you want.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Rossiter - 2019-05-16

Hi Neil,

Thanks very much for taking the time to respond to our inquiries. Your descriptions are very detailed and helpful here.

Your first assumption is correct for what I meant by “a directory server is up and available, but the DSA on the server is not”.

Yesterday we did some testing with the roundrobinserverset and were successful in not losing any connections while a server was down (i.e. rebooting in this case). We did determine that the directory idle connection timeout needs to be greater than the connection age timeout for this to be successful; at least for the short term (i.e. reboots); however the system became very sluggish and request timeouts were ocurring.
Checking at the directory level, there were no indications of high response times (i.e. highest time ~500ms). Repose, which is part of the application stack, has a timeout of 30s, and that was being exceeded. I think the connection response timeout was either 2s or 5s. We tried so many different settings I can't remember exactly. At any rate, from the information we are able to analyse at this time, we can only think this is due to the SDK trying to make connections to the unresponding directory server when its turn comes and waiting the connection timeout before retrying or moving on. Even with our meager testing at the time, we noticed that the sluggishness increases as time goes on until the directory server returns to service; probably due to the accumulation of requests to the server. There are 6 application servers that make up the service configured for round robin connecting to the directory layer, with 2 of them configured with properties for our testing purposes. So, all are trying to make connections to the unresponding directory server.

Just so I am really clear about how the roundrobinserverset works when it encounters an unresponsive server 'When using the connection pool, if a server is unresponsive, are subsequent connection requests, that would go to the next server in the list for example, blocked until the connection timeout elapeses to the unresponsive server and that connection is then attempted to another server, and so forth? '. OR When connection requests are queued, do they simply target the servers in the list sequentially without waiting if the previous request was successful?
Hope I asked that so you can understand it correctly.

We were going to try for fastestserverset today because that seemed to solve our problem, except that it seemed to heavily favour one server over the others and on the quick I can't figure out why that server 'responds quicker' than the others as the HW is all the same, except that maybe the network is a few ms faster to that server than the others, though they are all in neighboring cabinets physically. Still to be investigated in more detail. But your suggestion about the fewest connection server set seems to be a option here to consider; though it would probably also result in sluggishness when a server is unavailable since that server would have '0' connections. I think we are using SDK 4.0.9 R29290. Will chat with Jorge about that option.

This is a critical situation for us because if we lose a directory server in the round robin server set, as currently configured, for any reason (reboot, maintenance, unexpected server outage), it brings down the entire region using the application until that server returns or is taken out of the server set. At best our testing with roundrobingserverset has proven so far that we can maintain connections for the short duration, but sluggishness grows the longer the directory server is unavailable with operational timeouts. When the directory server returns to service, all goes back to normal, with a slow rate of connection distribution; which is not that bad.

Also, thanks for considering to add a brief 'blacklist' when servers are unresponding. It could even be configured something like 'after x number of connect failures/rejections, no attempts for x minutes'. I believe this would help us in our situation if it were available. :)

Thanks very much
Robert

Last edit: Robert Rossiter 2019-05-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Rossiter - 2019-05-17

Hi Neil,
We did some more testing and found that the round robin server set is not optimal for our use cases.

It seems the fastest connection server set will be better, though we still need to do some rigorous testing around that one.

What is the procedure to follow to get a feature, such as the temporary blacklist added?

Thanks very much for your assistance in our case.
Robert

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-20

I’ve already started working on it and hope to have it checked in within the next couple of days.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-21

I just committed a change that adds blacklisting support for the round-robin and fewest connections server sets (though I would generally recommend using the fewest connections server set over round-robin). This feature will be included in the next release, but you can try it now by checking out and building the LDAP SDK for yourself.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jorge Munoz - 2019-05-21

Thank you Neil for all the assistance you have provided. We will try the new changes with our current setup and share our results. What is the release cycle of unboundId?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-21

I'm not sure exactly when the next rleease will be, because it depends on a couple of other things, but I expect that it will probably be before the end of June.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jorge Munoz - 2019-05-21

Great, good to know.

Today we did a quick and dirty test using the LDAP SDK that includes the blacklisting support. We also switched from RoundRobinServerSet to FewestConnectionsServerSet.

Hi level testing actions:
We bleed out client connections to the router DSA on the server we want to take out of service. This action seems to put the server in the blacklist since the connections are now being rejected by the DSA. The other DSA routers started picking up connections that were being closed. Once no more connections were establish to the router DSA, we rebooted the server. Using the default value of 30s for the blacklistCheckIntervalMillis, we no longer notice any sluggishness or response time increases. This tells us that the server remained in the blacklist for the duration of the reboot. When the server completely rebooted, it started getting connection and leveraged out within minutes.

The provided changes were exactly what we needed. Thanks again for all the help you provided. We will keep an eye for the LDAP SDK 4.0.11 release.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jorge Munoz - 2019-05-21

Great, good to know.

Today we did a quick and dirty test using the LDAP SDK that includes the blacklisting support. We also switched from RoundRobinServerSet to FewestConnectionsServerSet.

Hi level testing actions:
We bleed out client connections to the router DSA on the server we want to take out of service. This action seems to put the server in the blacklist since the connections are now being rejected by the DSA. The other DSA routers started picking up connections that were being closed. Once no more connections were establish to the router DSA, we rebooted the server. Using the default value of 30s for the blacklistCheckIntervalMillis, we no longer notice any sluggishness or response time increases. This tells us that the server remained in the blacklist for the duration of the reboot. When the server completely rebooted, it started getting connection and leveraged out within minutes.

The provided changes were exactly what we needed. Thanks again for all the help you provided. We will keep an eye for the LDAP SDK 4.0.11 release.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-05-21

FYI, the blacklist implementation doesn't necessarily restore a server to full availability after the specified interval has passed. That just controls how frequently the blacklist manager will check any blacklisted servers to see if they're available again. If a server is offline for more than 30 seconds, it will remain on the blacklist until it's back up and all of the necessary conditions (which always includes establishing a connection, but may also include authenticating, performing post-connect processing, and passing the health check) have been satisfied.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Wilson - 2019-06-04

The 4.0.11 release is now available and includes the blacklisting functionality for the round-robin and fewest connections server sets.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Rossiter - 2019-06-04

Thank you Neil!!!
We are looking to test and schedule for implementation!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

API losing all connections to LDAP

A Java-based LDAP API

Forums

Help

API losing all connections to LDAP

API losing all connections to LDAP

A Java-based LDAP API

Forums

Help

API losing all connections to LDAP document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

API losing all connections to LDAP