If you catch an InterruptedException you should not call Thread.interrupted() because this will clear the interupted flag as statet in the javadoc.
This is wrongly implemented in org.hsqldb.Session maybe other places too.
Change code like that:
while (true) {
try {
latch.await();
} catch (InterruptedException e) {
interrupted = txInterruptRollback;
Thread.interrupted();
continue;
}
break;
}
to
while (true) {
try {
latch.await();
} catch (InterruptedException e) {
interrupted = txInterruptRollback;
Thread.currentThread().interrupt();
continue;
}
break;
}
I don't fully understand what's going on, but I'm not sure that's the right fix.
There is code in the outer loop in Session.executeCompiledStatement() does end up calling interrupt() after aborting the transaction.
We recently upgraded to 2.6.1 from 2.2.9 and ran into a similar problem. The code was getting hung on latch.await() even though we were interrupting the thread.
We ended up running "set database transaction rollback on interrupt true" or setting hsqldb.tx_interrupt_rollback=true on the connection string. I thought we also needed to patch that code to break out of the inner loop when interrupted and txInterruptRollback is true, but something else appears to be preventing our code from getting stuck there.
I know I'm missing something.
I'm attaching a lightly modified version of our test case that demonstrates the issue.
If you comment out the following in HypersonicCheckpointTest, you will see it get stuck on that latch.await() line.
The call to Thread.interrupted() is exactly to clear the interrupt. The fact of the interrupt is recorded and later used in the second code block to actually interrupt the thread.
Please let me know what is your purpose of interrupting an execution thread. In the latest version (2.7.0) there are capabilities such as aborting execution in another connection, cancelling the statement in the current connection, and setting query timeout prior to execution.
Also note the URL property is: hsqldb.tx_interrupt_rollback=true
We have at least 2 use cases for interrupting a thread:
. Prevent deadlocks on CountUpDownLatch.await() getting stuck. We patched it so that it would only wait at most 60 seconds before throwing an Exception. Our current patch does not appear to be sufficient for 2.6.1 as it results in permanently locking the database if the Database.CheckpointRunner gets interrupted.
I don't know how to globally set query timeouts or session timeouts, nor do I know if these would prevent the issues we've seen in the past.
. Users interacting with our (CLI based) application can cancel their current action with Ctrl-C or sometimes force other things to be cancelled that they are waiting on. Sometimes underneath the covers a SQL query is being interrupted.
Imagine that I implement a long running background job which is doing stuff against the hsqldb database. Now I want to gracefully stop the job by interrupting the Thread. If the execution is in this part of code, an InterruptedException will be thrown and the interrupted flag will be cleared here.
In my user code I frequently check if the thread is interrupted and I immeditally gracefully stop the work. But the interrupted flag is not set anymore.
This problem does not occur with mysql, mariadb, postgressql, mssql and oracle jdbc driver.
There has been no support for interrupting a thread accessing a database and we never considered the behaviour upon interrupt until around version 2.4.0, when we got a report that interrupts caused by a framework could cause issues with database so we added code to protect against interrupts during transaction waits. Later we were asked to support interrupts for unit test databases, which are typically single threaded, so we allowed this with the hsqldb.tx_interrupt_rollback setting.
For timeouts or stopping long running actions, there are now features that I mentioned in my previous reply.
If you know in advance that your statement must last less than 60 seconds, use setQueryTimeout
https://hsqldb.org/doc/2.0/apidocs/org.hsqldb/org/hsqldb/jdbc/JDBCStatement.html#setQueryTimeout(int)
If you want to cancel a long running statement, use ALTER SESSION from another connection.
http://hsqldb.org/doc/2.0/guide/sessions-chapt.html#snc_statements
This deliberate usage of interrupts has become an issue because we were not aware people were using them. I will treat this ticket as a feature request and see how far we can support it. As interrupts affect the transaction manager, it is not a trivial feature to support.
See also the cancel method, which can be invoked from a separate thread.
http://hsqldb.org/doc/2.0/apidocs/org.hsqldb/org/hsqldb/jdbc/JDBCStatement.html#cancel()
@Alex: if you start a HyperSQL server from your app, then connect to the server with jdbc:hsqldb:hsql://localhost instead of direct connection via jdbc:hsqldb:file:filepath, then you should get the same behaviour as the database servers you mentioned.
I just wanted to add the following...
If I take the same test I attached earlier and change it so the JDBC URL is "jdbc:hsqldb:test;hsqldb.tx_interrupt_rollback=true" and the code no longer explicitly runs "set database transaction rollback on interrupt true" in 2 places, the test case gets hung in issueBackup() in the Session code identified by alex.
There appears to be a different code path that running the explicit SQL statement is forcing for subsequent SQL being executed from the same connection.
@foss1024, if you replace the "set database transaction ..." statements with a simple "commit" it work. That statement performs a commit before setting the property, which is already "true". I tested with 2.7.0.
That is strange. So the property isn't really set until after the first transaction commits?
The property is definitely set with the URL property.
Also note there is a possible typo in the commented-out part of your code which uses connection1 for statement2. The statement commits connection1.
Otherwise, the original issue remains. The property =true does not allow interrupts to break out of the wait loop.
Good catch. Yes, it's a typo. When it properly reads as the following it has no effect.
My typo was fixing the test case. Without it, the code gets stuck in the while loop in Session.executeCompiledStatement() as the backup is waiting on the row level lock on "insert into foo values('2')".
I couldn't figure out how HSQLDB was either exiting that loop or taking a different code path.
I have checked the code and think it would be OK to break the wait loop on interrupt. You can get the code from SVN and add a few lines to Session.java and test. It does work with your test case when I try.
Latest change trunk in SVN appears to be r6587, so I went and made the same change locally and rebuilt.
While the test case doesn't get stuck, it's now rolling back the first connection as well as the second as it gets a rollback exception on this line:
Updated test case to remove the extra statement in there that was committing (or attempting to set database settings) on the wrong connection.
I have committed a change that extends the application of : hsqldb.tx_interrupt_rollback
When false, any interrupt while waiting for the latch is ignored and cleared. Interrupts during execution cause the current statement to abort and an error is returned. The interrupted flag is not cleared
When true, any interrupt while waiting for the latch causes the transaction to abort and rollback.
With the latest change, my test case passes in locks mode, but fails in MVCC mode with this.
It's getting stuck on
insertFoo(connection, "4");
and the KickMe thread is having to interrupt it after 2 seconds. If you comment outkickMe.start()
in insertFoo, you can see it get hung in MVCC mode.I don't know if this helps, but 2.3.2 is the latest version in which this test case succeeds. 2.3.3 and 2.3.4 fail in different ways. 2.3.4 is similar to 2.7.0 in that it gets stuck in the while loop. 2.3.3 ends up breaking the first connection in MVCC mode the same way this does (leaving a lock somewhere).
@foss1024: I think your test is locking itself.
For concurrent access to database, use a separate thread for each active connection. Otherwise when one connection is holding some lock, then the same thread uses another connection to wait for the same lock, they will not proceed and will get stuck.
Yes, the test purposely creates a deadlock with connection2.
What the test is trying to demonstrate is whether the database recovers from a Thread.interrupt() in a deterministic way when 1 connection is doing work and uncommitted and a 2nd connection is stuck waiting on the first connection to complete and must be aborted by an interrupt (either due to a timeout or a user canceling the task).
Ideally, only the connection being interrupted would be rolled back (releasing any locks it held) and any other open connection would continue to process as if nothing happened.
I've attached a modified version of the test which creates a separate thread for the first backup to happen, but it doesn't change the behavior. With 2.7.0, the backup thread hangs indefinitely on
insertFoo(connection, "2");
. With r6589 in MVCC mode, the test hangs oninsertFoo(connection, "4");
and has to be interrupted by the KickMe thread or the test case stays stuck. At that point there is no other connection open to the database.--
I hope this makes sense; I'm a little tired.
I will run your latest test (with separate threads) later. But I have modified the previous test to see exactly what happened.
After the failure to insert the value 4 due to the interrupt, I added code to insert it without going into your method that issues the interrupts. The connection is not broken and the insert succeeds. Then I executed the backup statement with a new connection without your interrupt code, and it performed the backup. So existing and new connections are fine after the previous interrupts.
// issueBackup(connection4);
It looks like you are reusing the original statement from the original connection to do the last backup.
Also, if you comment out the entire try/catch around insertFoo(connection, "4"), the next statement hangs whether it is the same statement or a new statement from the same connection.
Thanks. You're right.
I've just tried using connection4 for backup. If the original connection is not committed, it is normal for the backup statement to go into wait state. But I found even after committing, the backup goes into wait. So this is not normal and I will fix it.
I made a change to get it to work with MVCC as well. Your latest test works. Code committed to SVN.
HypersonicCheckpointTest2 does pass, but it is not rolling back the first backup as expected after the backup is interrupted.
I modified the test and created HypersonicCheckpointTest3 which verifies that the first backup is rolled back to check for this.
Side note: if you comment out these 2 lines, you can also see the backups that are generated.
The first backup includes
INSERT INTO FOO VALUES('2')
which is unexpected as that transaction isn't complete and isolation level is read committed. This is probably not be an issue if the backup is aborted/rolled back.Last edit: foss1024 2022-08-17