HyperSQL Database Engine (HSQLDB) / Bugs / #1418 Checkpoint deadlock in mvcc read_committed mode versions 2.2.6, 2.3.2, 2.3.3

Fred Toussi - 2015-10-08

It looks like the connection with SESSION_ID : 4545 has inserted or deleted a row and has not been committed. The internal session performing the checkpoint is waiting for this to commit. The rest of the sessions are waiting for the internal checkpoint session.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Danny - 2015-11-14

Hi Fred,

Thanks for the reply. I didn't have time to look into this but I'm looking at it now=) Shouldn't the uncommitted session be visible in the SYSTEM_SESSIONS table? The SESSION that seems to block everything doesn't show the query it performed. Is there a way I can log this in more detail?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Fred Toussi - 2015-11-16
  
  The uncommitted session is visible in the SYSTEM_SESSIONS. A query that was completed is not shown, only a query or statement that is current.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Danny - 2015-11-16

So I did some investigation. I use spring and hibernate and I found that when setting Transaction Propagation to NEVER, the 'deadlock' doesn't occur (I put hsqldb.log_size to 1 so that checkpoint occurs often enough).

I tried to narrow down a bit and it seems the issue occurs at a point where data is being inserted together with Propagation REQUIRED or Propagation REQUIRES_NEW. Something I didn't expect at all; I expected it to happen with deletes being rolled back

I plan to rewrite the cache in pure JDBC to see if I get similar issues

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Fred Toussi - 2015-11-16
  
  You can enable SQL logging and find out exactly what each session does.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Danny - 2015-11-16
    
    Thanks I have a look at it
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Danny - 2015-11-17

Hi Fred,

I think I found the issue (without the help of logging=). The image cache is limited to a certain disk space usage. So when an image needs to be cached, the calling thread asks the DiskSpaceEvaluator to ensure enough disk available. The latter is executed in a permanent thread. The calling thread 'joins' the DiskSpaceEvaluator until it is sure enough diskspace is available. In order to free up diskspace, old cached items are evicted. So, some records are deleted from HSQLDB

The calling thread though, performs its action in a transaction; it inserts the cache record in HSQLDB, then requests the diskspace and stores the file. If all succeeded, the transaction is committed. So, when the file store operation fails, the transaction is rolled back and HSQLDB remains unchanged.

The deadlock is caused because the CHECKPOINT operation is sometimes started at the moment the cache is evicted due to an incoming cache request. In that case the DiskSpaceEvaluator thread is blocked by the checkpoint operation; it can't delete record from HSQLDB and free up diskspace. Therefore, the calling thread cannot finish its cache transaction (which was started before eviction ofcourse). As a result, the CHECKPOINT can't finish because it is waiting for the cache transaction to be committed. Which in turn is waiting for the cache eviction, which in turn is waiting for the CHECKPOINT....

Nice =) This explains why the issue didn't occur with autocommit on inserts.

If my analysis is correct, I also think that HSQLDB could be improved a bit by not blocking all transactions during checkpoint operation. Not sure if such a thing is feasible?

I think I can work around it by requesting diskspace before inserting the cache record (with small risk that the cache is drained), or do a manual rollback in case file storage fails. In the worst case I could use transactiontimeout to make the cache operation fail. So fortunately I can still use HSQLDB (which is a great product, thanks for your efforts!!!).

reg,

Danny

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2015-11-17

You can create an SQL TRIGGER written in Java to delete the old files and the HSQLDB records after each insert into the table that holds the file records.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Danny - 2015-11-18

Thanks! How does that work with concurrent inserts? I'd guess that when inserting two or more cache record at nearly the same time, the SQL TRIGGERS would run at nearly the same time and all of them would select the same records for eviction. This would result in not deleting enough records and exceeding the maximum diskspace allowed to be used. This is why I use a single eviction thread that handles all request for diskspace and cleanup.

But for now I swapped the eviction and cache insert order; so diskspace is freed first and then the insert occurs. This way the deadlock can never occur. I'm happy enough with that approach.

Bottom line of what I learned is, I should prevent the following situation from happening:
1) thread A to start a transaction and join thread B
2) thread B to start a transaction and commit it
3) thread A to commit the transaction after thread B exits
because if the CHECKPOINT operation happens to occur between step 1 and 2, it blocks thread B from committing the transaction while it waits for thread A to commit, which won't happen until thread B exits.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2016-02-18

status: open --> open-fixed

assigned_to: Fred Toussi

Priority: 5 --> 1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2017-03-16

From version 2.3.4 CHECKPOINT does not start untill all transactions that have started at that point perform a commit or rollback.This includes transactions that have not changed the data. After CHECKPOINT starts new transactions wait for its completion.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2017-03-16

Group: current-release --> version 2.3.4
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Danny - 2017-03-17

Wow nice! Thanks for the notification. I'll try to test it this month.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2017-03-17

Please test with the latest SVN code for version 2.3.5 / 2.4.0 and report.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Checkpoint deadlock in mvcc read_committed mode versions 2.2.6, 2.3.2, 2.3.3

Relational Database

Group

Searches

Help

#1418 Checkpoint deadlock in mvcc read_committed mode versions 2.2.6, 2.3.2, 2.3.3

Discussion