Menu

#1659 InterruptedException handling

current-release
open-later
None
5
2022-08-22
2022-08-05
alex
No

If you catch an InterruptedException you should not call Thread.interrupted() because this will clear the interupted flag as statet in the javadoc.

This is wrongly implemented in org.hsqldb.Session maybe other places too.

Change code like that:

while (true) {
                try {
                    latch.await();
                } catch (InterruptedException e) {
                    interrupted = txInterruptRollback;

                    Thread.interrupted();

                    continue;
                }

                break;
            }

to

while (true) {
                try {
                    latch.await();
                } catch (InterruptedException e) {
                    interrupted = txInterruptRollback;

                    Thread.currentThread().interrupt();

                    continue;
                }

                break;
            }

Discussion

1 2 > >> (Page 1 of 2)
  • foss1024

    foss1024 - 2022-08-05

    I don't fully understand what's going on, but I'm not sure that's the right fix.
    There is code in the outer loop in Session.executeCompiledStatement() does end up calling interrupt() after aborting the transaction.

                if (abortTransaction || interrupted) {
                    Result result = handleAbortTransaction();
    
                    if (interrupted) {
                        Thread.currentThread().interrupt();
                    }
    
                    return result;
                }
    

    We recently upgraded to 2.6.1 from 2.2.9 and ran into a similar problem. The code was getting hung on latch.await() even though we were interrupting the thread.

    We ended up running "set database transaction rollback on interrupt true" or setting hsqldb.tx_interrupt_rollback=true on the connection string. I thought we also needed to patch that code to break out of the inner loop when interrupted and txInterruptRollback is true, but something else appears to be preventing our code from getting stuck there.

    I know I'm missing something.

    I'm attaching a lightly modified version of our test case that demonstrates the issue.
    If you comment out the following in HypersonicCheckpointTest, you will see it get stuck on that latch.await() line.

        try (Statement statement2 = connection.createStatement()) {
          // WHY do I need this here?  Shouldn't the previous "set
          // database transaction..." been sufficient?
          statement2.executeUpdate("set database transaction rollback on interrupt true");
        }
    
     
  • Fred Toussi

    Fred Toussi - 2022-08-05

    The call to Thread.interrupted() is exactly to clear the interrupt. The fact of the interrupt is recorded and later used in the second code block to actually interrupt the thread.

    Please let me know what is your purpose of interrupting an execution thread. In the latest version (2.7.0) there are capabilities such as aborting execution in another connection, cancelling the statement in the current connection, and setting query timeout prior to execution.

     
  • Fred Toussi

    Fred Toussi - 2022-08-05

    Also note the URL property is: hsqldb.tx_interrupt_rollback=true

     
  • foss1024

    foss1024 - 2022-08-06

    We have at least 2 use cases for interrupting a thread:
    . Prevent deadlocks on CountUpDownLatch.await() getting stuck. We patched it so that it would only wait at most 60 seconds before throwing an Exception. Our current patch does not appear to be sufficient for 2.6.1 as it results in permanently locking the database if the Database.CheckpointRunner gets interrupted.

     "HSQLDB Timer @1c00cae" #108 daemon prio=5 os_prio=0 tid=0x007a6400 nid=0x1fae waiting on condition [0x9145c000]
        java.lang.Thread.State: TIMED_WAITING (parking)
           at sun.misc.Unsafe.park(Native Method)
           - parking to wait for  <0x9dcf1568> (a org.hsqldb.lib.CountUpDownLatch$Sync)
           at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
           at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
           at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
           at org.hsqldb.lib.CountUpDownLatch.await(CountUpDownLatch.java:157)
           at org.hsqldb.lib.CountUpDownLatch.await(CountUpDownLatch.java:106)
           at org.hsqldb.Session.executeCompiledStatement(Session.java:1360)
           at org.hsqldb.Database$CheckpointRunner.run(Database.java:845)
           at org.hsqldb.lib.HsqlTimer$TaskRunner.run(HsqlTimer.java:643)
           at java.lang.Thread.run(Thread.java:748
    

    I don't know how to globally set query timeouts or session timeouts, nor do I know if these would prevent the issues we've seen in the past.

    . Users interacting with our (CLI based) application can cancel their current action with Ctrl-C or sometimes force other things to be cancelled that they are waiting on. Sometimes underneath the covers a SQL query is being interrupted.

     
  • alex

    alex - 2022-08-06

    Imagine that I implement a long running background job which is doing stuff against the hsqldb database. Now I want to gracefully stop the job by interrupting the Thread. If the execution is in this part of code, an InterruptedException will be thrown and the interrupted flag will be cleared here.

    In my user code I frequently check if the thread is interrupted and I immeditally gracefully stop the work. But the interrupted flag is not set anymore.

    This problem does not occur with mysql, mariadb, postgressql, mssql and oracle jdbc driver.

     
  • Fred Toussi

    Fred Toussi - 2022-08-06
    • status: open --> open-later
    • assigned_to: Fred Toussi
    • Group: version 2.5.x --> current-release
     
  • Fred Toussi

    Fred Toussi - 2022-08-06

    There has been no support for interrupting a thread accessing a database and we never considered the behaviour upon interrupt until around version 2.4.0, when we got a report that interrupts caused by a framework could cause issues with database so we added code to protect against interrupts during transaction waits. Later we were asked to support interrupts for unit test databases, which are typically single threaded, so we allowed this with the hsqldb.tx_interrupt_rollback setting.

    For timeouts or stopping long running actions, there are now features that I mentioned in my previous reply.

    If you know in advance that your statement must last less than 60 seconds, use setQueryTimeout

    https://hsqldb.org/doc/2.0/apidocs/org.hsqldb/org/hsqldb/jdbc/JDBCStatement.html#setQueryTimeout(int)

    If you want to cancel a long running statement, use ALTER SESSION from another connection.

    http://hsqldb.org/doc/2.0/guide/sessions-chapt.html#snc_statements

    This deliberate usage of interrupts has become an issue because we were not aware people were using them. I will treat this ticket as a feature request and see how far we can support it. As interrupts affect the transaction manager, it is not a trivial feature to support.

     
  • Fred Toussi

    Fred Toussi - 2022-08-06

    @Alex: if you start a HyperSQL server from your app, then connect to the server with jdbc:hsqldb:hsql://localhost instead of direct connection via jdbc:hsqldb:file:filepath, then you should get the same behaviour as the database servers you mentioned.

     
  • foss1024

    foss1024 - 2022-08-08

    I just wanted to add the following...

    If I take the same test I attached earlier and change it so the JDBC URL is "jdbc:hsqldb:test;hsqldb.tx_interrupt_rollback=true" and the code no longer explicitly runs "set database transaction rollback on interrupt true" in 2 places, the test case gets hung in issueBackup() in the Session code identified by alex.

    There appears to be a different code path that running the explicit SQL statement is forcing for subsequent SQL being executed from the same connection.

     
  • Fred Toussi

    Fred Toussi - 2022-08-10

    @foss1024, if you replace the "set database transaction ..." statements with a simple "commit" it work. That statement performs a commit before setting the property, which is already "true". I tested with 2.7.0.

     
  • foss1024

    foss1024 - 2022-08-11

    That is strange. So the property isn't really set until after the first transaction commits?

     
  • Fred Toussi

    Fred Toussi - 2022-08-11

    The property is definitely set with the URL property.

    Also note there is a possible typo in the commented-out part of your code which uses connection1 for statement2. The statement commits connection1.

    Otherwise, the original issue remains. The property =true does not allow interrupts to break out of the wait loop.

     
  • foss1024

    foss1024 - 2022-08-11

    Good catch. Yes, it's a typo. When it properly reads as the following it has no effect.

        try (Statement statement2 = connection2.createStatement()) {
          statement2.executeUpdate("commit");
        }
    

    My typo was fixing the test case. Without it, the code gets stuck in the while loop in Session.executeCompiledStatement() as the backup is waiting on the row level lock on "insert into foo values('2')".

    I couldn't figure out how HSQLDB was either exiting that loop or taking a different code path.

     
  • Fred Toussi

    Fred Toussi - 2022-08-12

    I have checked the code and think it would be OK to break the wait loop on interrupt. You can get the code from SVN and add a few lines to Session.java and test. It does work with your test case when I try.

            boolean interrupted = false;
    
            while (true) {
                try {
                    latch.await();
                } catch (InterruptedException e) {
                    interrupted = txInterruptRollback;
    
                    Thread.interrupted();
    
                    if (interrupted) {
                        break;  // new code
                    }
    
                    continue;
                }
    
                break;
            }
    
     
  • foss1024

    foss1024 - 2022-08-12

    Latest change trunk in SVN appears to be r6587, so I went and made the same change locally and rebuilt.

    While the test case doesn't get stuck, it's now rolling back the first connection as well as the second as it gets a rollback exception on this line:

        insertFoo(connection, "3");
    

    Updated test case to remove the extra statement in there that was committing (or attempting to set database settings) on the wrong connection.

     
  • Fred Toussi

    Fred Toussi - 2022-08-13

    I have committed a change that extends the application of : hsqldb.tx_interrupt_rollback

    When false, any interrupt while waiting for the latch is ignored and cleared. Interrupts during execution cause the current statement to abort and an error is returned. The interrupted flag is not cleared

    When true, any interrupt while waiting for the latch causes the transaction to abort and rollback.

     
  • foss1024

    foss1024 - 2022-08-13

    With the latest change, my test case passes in locks mode, but fails in MVCC mode with this.

    1) testHypersonicMvcc(HypersonicCheckpointTest2)
    junit.framework.AssertionFailedError: Original connection is broken: transaction rollback: serialization failure
        at junit.framework.Assert.fail(Assert.java:57)
        at junit.framework.TestCase.fail(TestCase.java:227)
        at HypersonicCheckpointTest2.runTest(HypersonicCheckpointTest2.java:147)
        at HypersonicCheckpointTest2.testHypersonicMvcc(HypersonicCheckpointTest2.java:96)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at junit.framework.TestSuite.runTest(TestSuite.java:252)
        at junit.framework.TestSuite.run(TestSuite.java:247)
        at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
        at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
        at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
    

    It's getting stuck on insertFoo(connection, "4"); and the KickMe thread is having to interrupt it after 2 seconds. If you comment out kickMe.start() in insertFoo, you can see it get hung in MVCC mode.

    I don't know if this helps, but 2.3.2 is the latest version in which this test case succeeds. 2.3.3 and 2.3.4 fail in different ways. 2.3.4 is similar to 2.7.0 in that it gets stuck in the while loop. 2.3.3 ends up breaking the first connection in MVCC mode the same way this does (leaving a lock somewhere).

     
  • Fred Toussi

    Fred Toussi - 2022-08-13

    @foss1024: I think your test is locking itself.

    For concurrent access to database, use a separate thread for each active connection. Otherwise when one connection is holding some lock, then the same thread uses another connection to wait for the same lock, they will not proceed and will get stuck.

     
  • foss1024

    foss1024 - 2022-08-13

    Yes, the test purposely creates a deadlock with connection2.

    What the test is trying to demonstrate is whether the database recovers from a Thread.interrupt() in a deterministic way when 1 connection is doing work and uncommitted and a 2nd connection is stuck waiting on the first connection to complete and must be aborted by an interrupt (either due to a timeout or a user canceling the task).

    Ideally, only the connection being interrupted would be rolled back (releasing any locks it held) and any other open connection would continue to process as if nothing happened.

    I've attached a modified version of the test which creates a separate thread for the first backup to happen, but it doesn't change the behavior. With 2.7.0, the backup thread hangs indefinitely on insertFoo(connection, "2");. With r6589 in MVCC mode, the test hangs on insertFoo(connection, "4"); and has to be interrupted by the KickMe thread or the test case stays stuck. At that point there is no other connection open to the database.

    --
    I hope this makes sense; I'm a little tired.

     
  • Fred Toussi

    Fred Toussi - 2022-08-15

    I will run your latest test (with separate threads) later. But I have modified the previous test to see exactly what happened.

    After the failure to insert the value 4 due to the interrupt, I added code to insert it without going into your method that issues the interrupts. The connection is not broken and the insert succeeds. Then I executed the backup statement with a new connection without your interrupt code, and it performed the backup. So existing and new connections are fine after the previous interrupts.

        debug("");
        try {
            insertFoo(connection, "4");
            connection.commit();
    
        } catch (Throwable ex) {
            log.log(SEVERE, "Original connection is broken", ex);
            // fail("Original connection is broken: " + ex.getMessage());
    
            boolean interrupted = Thread.currentThread().isInterrupted();
    
            log.log(SEVERE, "Interrupt state " + interrupted);
    
        } finally {
            // connection.close();
        }
    
        debug("Reusing connection to insert");
    
        statement.executeUpdate("insert into foo values('" + 4 + "')");
    
        debug("insert successful");
    
        debug("");
        debug("Getting new connection");
        final Connection connection4 = DriverManager.getConnection(CONNECTION_URL, "sa", "");
        try {
    
            File tempDirectory = Files.createTempDirectory("backup").toFile();
            File backupLocation = new File(tempDirectory, "hslqdb-backup.tar.gz");
            statement.executeUpdate("backup database to '" + backupLocation + "' blocking");
            debug("Issuing backup completed.");
    

    // issueBackup(connection4);

        } catch (Throwable ex) {
            log.log(SEVERE, "Even new connection is messed up", ex);
            fail("Even new connection is messed up: " + ex.getMessage());
    
        } finally {
            connection4.close();
        }
    
     
  • foss1024

    foss1024 - 2022-08-15

    It looks like you are reusing the original statement from the original connection to do the last backup.

    Also, if you comment out the entire try/catch around insertFoo(connection, "4"), the next statement hangs whether it is the same statement or a new statement from the same connection.

     
  • Fred Toussi

    Fred Toussi - 2022-08-15

    Thanks. You're right.

    I've just tried using connection4 for backup. If the original connection is not committed, it is normal for the backup statement to go into wait state. But I found even after committing, the backup goes into wait. So this is not normal and I will fix it.

     
  • Fred Toussi

    Fred Toussi - 2022-08-16

    I made a change to get it to work with MVCC as well. Your latest test works. Code committed to SVN.

     
  • foss1024

    foss1024 - 2022-08-17

    HypersonicCheckpointTest2 does pass, but it is not rolling back the first backup as expected after the backup is interrupted.

    I modified the test and created HypersonicCheckpointTest3 which verifies that the first backup is rolled back to check for this.

    Side note: if you comment out these 2 lines, you can also see the backups that are generated.

          backupLocation.delete();
          tempDirectory.delete();
    

    The first backup includes INSERT INTO FOO VALUES('2') which is unexpected as that transaction isn't complete and isolation level is read committed. This is probably not be an issue if the backup is aborted/rolled back.

     

    Last edit: foss1024 2022-08-17
1 2 > >> (Page 1 of 2)

Log in to post a comment.