Menu

health check thread continuously throwing socket timeout because of trySynchronousReadDuringHealthCheck

2021-08-03
2024-04-08
  • Jayanth Sanganabatla

    I am using the LDAPConnectionPool in a synchronous mode. The pool currently has only default health check.
    After the pool started, the health check thread throwing Result code 81 with Socket timeout exception for all the connections. The exception says server down, but it is not really.
    Upon analysis, i got to know because of trySynchronousReadDuringHealthCheck set to true by default, the thread try to connect with very short timeout(1millisec) and eventually fails.
    I verified in couple of environments (even in local), i see the same exception.

    What is the real purpose of trySynchronousReadDuringHealthCheck? what happens if it set to false? Do i need to do any supporting logic if i make it false?

     
  • Neil Wilson

    Neil Wilson - 2021-08-03

    The purpose of that read attempt is to make sure that the connection is still valid and usable. It allows a defunct connection to be detected and replaced more quickly, and with less chance that it will adversely affect your application.

    When the LDAP SDK is operating in the default asynchronous mode, there is a background thread that continuously attempts to read data from each connection, and that read attempt will block until there is something available to read. In the event that the connection is closed by the server (for example, if it's been idle for too long or if the server is shutting down) or otherwise becomes invalid in a way that the client's networking stack can detect, that blocking read will fail immediately and the LDAP SDK will be able to take action to try to replace that failed connection.

    However, when the LDAP SDK is operating in synchronous mode, it normally only attempts to read data from a connection after it has sent a request on that connection and is expecting a response. If a connection is closed by the server, then the LDAP SDK would have no way of knowing until the next attempt was made to use that connection, and at that point, depending on how the application is using the LDAP SDK, it may result in a failed operation attempt or a delay as the LDAP SDK detects the failure and needs to try re-establishing a connection to try the operation again.

    For a connection operating in synchronous mode, I'm not aware of any means of detecting this type of closure other than attempting a read on the connection. If the server has closed a connection, then Socket.isConnected will still return true, Socket.isClosed will still return false, and Socket.getInputStream().available will return zero. It's only by attempting to read from the connection that the client will become aware that it's no longer valid. But by default, read attempts will block until either there's something available to read or until the connection is closed, and we don't want to block forever if we're just checking to see if a connection is still valid. To work around that, we use the SO_TIMEOUT socket option with the shortest possible timeout. In the common case in which the connection is still valid, we will have only blocked for one millisecond in a background health check thread, and will have confirmed that the connection is still usable. But if the connection has been closed, we will be able to detect and react to that right away by attempting to create a new connection to take its place.

    If you're using the SO_TIMEOUT socket option and a read times out, then the read attempt will throw a SocketTimeoutException. The LDAP SDK catches this and handles it properly. If that happens, it will log a debug message, but it will do so at Level.FINEST, so you should only ever be aware of it if you've enabled debugging and turned it up to the highest debug level. I wouldn't recommend doing that without a good reason because that's going to generate a lot of noise about completely normal conditions that could be misinterpreted as a problem by someone who isn't really familiar with the inner workings of the LDAP SDK. Under most circumstances, I wouldn't recommend a debug level higher than INFO, and probably even WARNING is enough to see things that may actually be a problem.

    Are you seeing this exception because you've turned on debugging at the highest level, or are you seeing it in some other way?

     
  • Jayanth Sanganabatla

    Thanks Neil.
    I tried with both FINEST, INFO. in both the cases, the exception is thrown with "WARNING:{ debug-type:exception" Result code: 81, server down.
    I did set the ResponseTimeoutmillisec as 5sec.

     
  • Neil Wilson

    Neil Wilson - 2021-08-03

    Could you provide the complete debug message and stack trace for the exception?

     
  • Jayanth Sanganabatla

    WARNING: { "timestamp":"2021-08-03T14:59:34.280Z", "debug-type":"exception", "level":"WARNING", "thread-id":34, "thread-name":"Health Check Thread for LDAPConnectionPool(serverSet=SingleServerSet(server=localhost:636, includesAuthentication=false, includesPostConnectProcessing=false), maxConnections=100)", "caught-exception":{ "message":"An I/O error occurred while trying to read the response from the server: SocketTimeoutException(Read timed out), ldapSDKVersion=6.0.0, revision=524c20f3bbcc0d83fb56b9e136a2fd3a7f60437d", "stack-trace":[ "readLDAPResponseFrom(LDAPMessage.java:1212)", "readResponse(LDAPConnectionReader.java:651)", "readResponse(LDAPConnection.java:5131)", "invokeHealthCheck(LDAPConnectionPool.java:3112)", "invokeHealthCheck(LDAPConnectionPool.java:2953)", "doHealthCheck(LDAPConnectionPool.java:2922)", "run(LDAPConnectionPoolHealthCheckThread.java:112)" ], "cause":{ "message":"Read timed out", "stack-trace":[ "socketRead0(SocketInputStream.java:native)", "socketRead(SocketInputStream.java:116)", "read(SocketInputStream.java:170)", "read(SocketInputStream.java:141)", "readFully(InputRecord.java:465)", "read(InputRecord.java:503)", "readRecord(SSLSocketImpl.java:973)", "readDataRecord(SSLSocketImpl.java:930)", "read(AppInputStream.java:105)", "fill(BufferedInputStream.java:246)", "read(BufferedInputStream.java:265)", "read(ASN1StreamReader.java:1199)", "readType(ASN1StreamReader.java:360)", "beginSequence(ASN1StreamReader.java:1118)", "readLDAPResponseFrom(LDAPMessage.java:1198)", "readResponse(LDAPConnectionReader.java:651)", "readResponse(LDAPConnection.java:5131)", "invokeHealthCheck(LDAPConnectionPool.java:3112)", "invokeHealthCheck(LDAPConnectionPool.java:2953)", "doHealthCheck(LDAPConnectionPool.java:2922)", "run(LDAPConnectionPoolHealthCheckThread.java:112)" ] } }, "ldap-sdk-version":"6.0.0", "ldap-sdk-revision":"524c20f3bbcc0d83fb56b9e136a2fd3a7f60437d" }
    LDAPException(resultCode=81 (server down), errorMessage='An I/O error occurred while trying to read the response from the server: SocketTimeoutException(Read timed out), ldapSDKVersion=6.0.0, revision=524c20f3bbcc0d83fb56b9e136a2fd3a7f60437d')
    at com.unboundid.ldap.protocol.LDAPMessage.readLDAPResponseFrom(LDAPMessage.java:1212)
    at com.unboundid.ldap.sdk.LDAPConnectionReader.readResponse(LDAPConnectionReader.java:651)
    at com.unboundid.ldap.sdk.LDAPConnection.readResponse(LDAPConnection.java:5131)
    at com.unboundid.ldap.sdk.LDAPConnectionPool.invokeHealthCheck(LDAPConnectionPool.java:3112)
    at com.unboundid.ldap.sdk.LDAPConnectionPool.invokeHealthCheck(LDAPConnectionPool.java:2953)
    at com.unboundid.ldap.sdk.LDAPConnectionPool.doHealthCheck(LDAPConnectionPool.java:2922)
    at com.unboundid.ldap.sdk.LDAPConnectionPoolHealthCheckThread.run(LDAPConnectionPoolHealthCheckThread.java:112)
    Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
    at sun.security.ssl.InputRecord.read(InputRecord.java:503)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at com.unboundid.asn1.ASN1StreamReader.read(ASN1StreamReader.java:1199)
    at com.unboundid.asn1.ASN1StreamReader.readType(ASN1StreamReader.java:360)
    at com.unboundid.asn1.ASN1StreamReader.beginSequence(ASN1StreamReader.java:1118)
    at com.unboundid.ldap.protocol.LDAPMessage.readLDAPResponseFrom(LDAPMessage.java:1198)
    ... 6 more

     
  • Jayanth Sanganabatla

    just before that, it is printing about, timeout values.

    INFO: { "timestamp":"2021-08-03T14:59:34.265Z", "debug-type":"connect", "level":"INFO", "thread-id":34, "thread-name":"Health Check Thread for LDAPConnectionPool(serverSet=SingleServerSet(server=localhost:636, includesAuthentication=false, includesPostConnectProcessing=false), maxConnections=100)", "message":"Setting the SO_TIMEOUT value for connection LDAPConnection(poolName='PD - SSOA Connection Pool', connected to localhost:636) to 10000ms.", "ldap-sdk-version":"6.0.0", "ldap-sdk-revision":"524c20f3bbcc0d83fb56b9e136a2fd3a7f60437d" }
    Aug 03, 2021 7:59:34 AM com.unboundid.util.Debug log
    INFO: { "timestamp":"2021-08-03T14:59:34.265Z", "debug-type":"connect", "level":"INFO", "thread-id":34, "thread-name":"Health Check Thread for LDAPConnectionPool(serverSet=SingleServerSet(server=localhost:636, includesAuthentication=false, includesPostConnectProcessing=false), maxConnections=100)", "message":"Setting the SO_TIMEOUT value for connection LDAPConnection(poolName='PD - SSOA Connection Pool', connected to localhost:636) to 1ms.", "ldap-sdk-version":"6.0.0", "ldap-sdk-revision":"524c20f3bbcc0d83fb56b9e136a2fd3a7f60437d" }
    Aug 03, 2021 7:59:34 AM com.unboundid.util.Debug log

     
  • Jayanth Sanganabatla

    I am getting timeouts (Response Code 85) for some operations intermittently. Because of that reason debug mode is enabled and saw these exceptions.
    As we can see above, the timeout setting configured by me was 10sec . now the ldap sets it to 1millsec for some internal operation. Is that interfering with the currently running operations and excepts a response in 1 millisec and causing timeouts.
    Just a guess, we can leave if the these two are completely independent.

     
  • Neil Wilson

    Neil Wilson - 2021-08-03

    I looked into this and found a case in which the message could be logged at a WARNING level instead of FINEST. I've just committed a change to address that, and these innocuous debug messages are now properly recorded at the FINEST level.

    Note that the only issue that I found was that the debug messages were recorded at a higher debug level than expected, and it shouldn't have affected anything else. If you are actually getting a timeout result in your application code, then that suggests that the server isn't sending a response to a request within the configured timeout period.

     
  • Jayanth Sanganabatla

    Thank you very much for quick fix . :)

     
  • takapon231

    takapon231 - 2024-04-06

    The persistent issue of the health check thread continuously throwing socket timeout errors raises concerns regarding system reliability and data integrity, especially within healthcare environments where timely access to critical information is paramount. Addressing this challenge requires a thorough investigation into network configurations, application performance optimizations, and potential infrastructure upgrades. Additionally, ensuring compliance with HIPAA checklist guidelines is imperative to safeguard patient privacy and confidentiality amidst technical troubleshooting efforts. By prioritizing both technical resolutions and adherence to regulatory standards, healthcare organizations can mitigate risks and uphold the integrity of their systems effectively.

     
  • Neil Wilson

    Neil Wilson - 2024-04-08

    The LDAP SDK itself never throws this exception. The JVM will throw it if you've set the SO_TIMEOUT socket option (which says that a read operation should not be allowed to block for more than the specified length of time), and if an attempt to read data from the associated socket doesn't yield any results within the allotted time period. This is a completely normal condition that the JVM happens to use an exception to convey.

    The LDAP SDK does look for this exception being thrown, and if it encounters one, it will generate a debug message that is only visible if the most verbose level of debugging is enabled. This isn't an exception that should be conveyed outside of the LDAP SDK (unless you have the most level of debugging enabled and you have something consuming those debug log messages). Unless you've got something that has instrumented the JVM in some way and goes behind the LDAP SDK's back to get information about exceptions that are being thrown in the JVM (and if that's the case, then we can't do anything about that), then applications using the LDAP SDK shouldn't be aware of their existence. If you are seeing them in some other form, then please let me know how you are seeing them (providing the full stack trace from the exception), and I can investigate that further.

    If the SocketTimeoutExceptions that you're specifically talking about only appear to be happening in the health check thread, and if you've configured the LDAP SDK to operate in synchronous mode, then this is probably a result of the "trySynchronousReadDuringHealthCheck" setting in the connection pool. This is something that the connection pool uses to make sure that connections operating in synchronous mode are still valid so that it can detect lost connections more quickly. You can turn that off (by calling the LDAPConnectionPool.setTrySynchronousReadDuringHealthCheck method with a value of false), but unless you've configured a health check that actually tries to communicate over the connection, the LDAP SDK won't be able to detect certain types of connection failures in as timely a manner.

     

Log in to post a comment.