Menu

#1674 HSQLDB Deadlock using 2.7.1 and MVCC

current-release
open-fixed
None
5
2024-03-05
2023-01-30
xiaoyx
No

Be similar to https://sourceforge.net/p/hsqldb/bugs/1418/ {Checkpoint deadlock in mvcc read_committed mode versions 2.2.6, 2.3.2, 2.3.3].
If an active transaction is executed, checkpoint will cause a deadlock.
Viewing the image, you can see that the active transaction and checkpoint wait for each other and deadlock

2 Attachments

Discussion

1 2 > >> (Page 1 of 2)
  • Fred Toussi

    Fred Toussi - 2023-01-30
     
  • Fred Toussi

    Fred Toussi - 2023-01-30

    Thanks for reporting. Your screenshots clearly show the deadlock Will be fixed for the next release.

    You can try setting the sessions that perform SELECTs to auto-commit, if this doesn't affect your application.

     
    • xiaoyx

      xiaoyx - 2023-01-30

      thks for your reply

       
  • Sai

    Sai - 2023-02-28

    I am using HSQL 2.7.1 in embedded (file) mode. Most of the tables are in mem, except for a couple of Cached tables. MVCC as transaction control mode.

    I notice regularly during performance/load tests that the application goes unresponsive and whenever this happens, the DB log file size has just crossed the hsqldb.log_size limit. The Checkpoint happens atleast for a couple of rounds when the log file size crosses the log_size limit. I suspect some deadlock but since the application does not respond, I am unable to fetch details from the SYSTEM_SESSIONS table. Checked with multiple settings for the log_size.
    To confirm, I set hsqldb.log_data=false and I do not see this issue.

    Is this related to the issue mentioned in this thread? If so, is there a fix possible? Any workarounds that can be tried?

     

    Last edit: Sai 2023-02-28
  • Fred Toussi

    Fred Toussi - 2023-02-28

    This may be related as there is a checkpoint when the log_size is reached. You could set the log_size to zero to avoid auto checkpoints and do the checkpoints from your app when there is no other database access.

     
    👍
    1
  • Sai

    Sai - 2023-03-20

    Hi @fredt
    May I know is a fix for this issue is planned?

     
  • Fred Toussi

    Fred Toussi - 2023-03-20

    Next release will probably go out in May.

     
    • Sai

      Sai - 2023-03-20

      Thanks!

       
  • Peter Winckles

    Peter Winckles - 2023-06-06

    Sadly, it looks like the fix didn't make the 2.7.2 release. Here's a project that demonstrates the problem, if it helps.

    https://github.com/pwinckles/hsqldb-deadlock

     
  • Fred Toussi

    Fred Toussi - 2023-06-06

    Hello Peter. Thanks for the test case. I will run it and see.

    I looked at Main.java and noticed you have c.setAutoCommit(false); but there is no call to commit on the connection used for the SELECT statements. Obviously connections must all commit to allow a CHECKPOINT to proceed.

     
    • Peter Winckles

      Peter Winckles - 2023-06-06

      Sorry about that. I was porting the example from a test case that was using Hibernate. It doesn't really affect the test one way or the other, but I did just add the manual commit.

       
    • Peter Winckles

      Peter Winckles - 2023-06-06

      To be clear, the code is demonstrating that there is a deadlock with MVCC in the following circumstance, which does not happen with the other strategies:

      1. Thread 1: Start a transaction
      2. Thread 2: While the thread 1 transaction is open, request a CHECKPOINT, which blocks on the open transaction.
      3. Thread 1: After the CHECKPOINT has been requested, spawn thread 3 within the transaction that executes a new query, and join this thread to Thread 1 before closing the original transaction.

      This is probelmatic for our application because we'd like to move to MVCC because there are are circumstances where the LOCKS strategy results in a deadlock. However, moving to MVCC resolves these deadlocks, but is untenable because there is now a new deadlock on CHECKPOINT.

       

      Last edit: Peter Winckles 2023-06-06
  • Fred Toussi

    Fred Toussi - 2023-06-06

    Thanks Peter. When using HSQLDB in embedded mode, there must be one connection per thread. The threads are put to wait by a latch belonging to the connection. I suppose your thread 3 is using its own connection. I don't understand why it joins thread 1, but I have not yet run the code.

     
  • Fred Toussi

    Fred Toussi - 2023-06-28

    Peter, I have run the test and do not understand what the threads in your tests are doing. The execution never reaches the point of the COMMIT that you added. I am sure if the first thread actually executes the COMMIT, it will unblock. After your step 3, the CHECKPOINT is waiting for the first uncommitted SELECT and the second SELECT is waiting for the CHECKPOINT.

     
    • Peter Winckles

      Peter Winckles - 2023-06-28

      @fredt Yes, that's the problem, it will never reach the commit because it's deadlocked. This deadlock happens because a thread with an open tx is waiting to commit the tx on a second thread that also has an open tx. The tx in the second thread is in turn blocked by the pending CHECKPOINT (requested by a third thread).

      The code in the example is contrived to demonstrate the problem. This problem could happen anytime an application waits on another thread within a tx, and the other thread performs db operations.

       

      Last edit: Peter Winckles 2023-06-28
  • Fred Toussi

    Fred Toussi - 2023-06-28

    Peter, with your scenario there will be deadlocks in LOCKS and MVCC. HSQLDB uses threads and latches to schedule the execution. You cannot do something with the threads that interferes with this. You can probably try the client/server model to separate the HSQLDB's threads from your app's.

     
  • Peter Winckles

    Peter Winckles - 2023-06-28

    @fredt That test code works against LOCKS. Try it.

     

    Last edit: Peter Winckles 2023-06-28
  • Fred Toussi

    Fred Toussi - 2023-06-28

    Peter, Transactions are scheduled differently with LOCKS. In this case, Thread 3 needs the same READ lock (a shareable lock) that is held by Thread 1's open transaction, so it goes ahead and executes the SELECT without being paused on the latch.

     
  • Peter Winckles

    Peter Winckles - 2023-06-28

    @fredt The difference seems to be that with MVCC CHECKPOINT blocks all transactions that come in behind it, but with LOCKS this does not happen. The order is:

    1. SELECT
    2. CHECKPOINT
    3. SELECT
     
  • Fred Toussi

    Fred Toussi - 2023-06-28

    Yes. MVCC uses a CATALOG LOCK for CHECKPOINT and DDL. There is no other lock. With LOCKS, there are READ and WRITE locks for each table as well.

     
  • Peter Winckles

    Peter Winckles - 2023-06-28

    Is the LOCKS CHECKPOINT lock the same as MVCC? Would you expect them to behave the same? If so, I'm sorry, but I really don't understand your explaination of why the test case works with LOCKS.

    Here's the output of the test with LOCKS:

    java -jar target/hsqldb-deadlock-1.0-SNAPSHOT-exec.jar LOCKS
    18:05:50.051 [pool-1-thread-1] INFO com.pwinckles.Main - Before query 1
    18:05:50.057 [main] INFO com.pwinckles.Main - Before checkpoint
    18:05:51.058 [pool-1-thread-2] INFO com.pwinckles.Main - Before query 2
    18:05:51.060 [pool-1-thread-2] INFO com.pwinckles.Main - After query 2
    18:05:51.060 [pool-1-thread-1] INFO com.pwinckles.Main - After query 1
    18:05:51.060 [main] INFO com.pwinckles.Main - After checkpoint

    Here's the output with MVCC:

    java -jar target/hsqldb-deadlock-1.0-SNAPSHOT-exec.jar MVCC
    18:06:43.518 [pool-1-thread-1] INFO com.pwinckles.Main - Before query 1
    18:06:43.524 [main] INFO com.pwinckles.Main - Before checkpoint
    18:06:44.526 [pool-1-thread-2] INFO com.pwinckles.Main - Before query 2

    I must be not understanding something, but it sure looks like to me that with LOCKS, CHECKPOINT does not prevent queries that come in behind it from executing, which is not the case with MVCC.

     
  • Fred Toussi

    Fred Toussi - 2023-06-28

    You understand it correctly. With LOCKS, the second SELECT needs the same shareable lock that is held by the first SELECT, so it goes ahead. If instead of the second SELECT, there was a DML statement modifying a separate table, it would be paused until the CHECKPOINT finishes.

     
  • Peter Winckles

    Peter Winckles - 2023-06-28

    @fredt Ah, thank you for taking the time to explain that to me. So, what would it take to reproduce the deadlock with LOCKS? I modified the code to do the following:

    1. SELECT table-1
    2. CHECKPOINT
    3. INSERT table-2

    But, it still didn't deadlock. I wouldn't have expected that writing to a different table would have required a new lock.

    I'm going to try to get detailed logging from a real life MVCC deadlock tomorrow that will hopefully prove to be more interesting than my test scenario.

     
  • Fred Toussi

    Fred Toussi - 2023-06-29

    I may have given the wrong example but a CHECKPOINT with LOCKS can block.
    Locks in SQL can be considered as sets. A transaction may be waiting to obtain multiple locks and it will eventually go through when it can obtain the whole set at once. Or it may obtain the locks one at a time. It releases them all at commit.. There is logically no possibility of deadlock as this is avoided with an exception.

     
  • Peter Winckles

    Peter Winckles - 2023-08-30

    @fredt Okay, I finally have a real (not a contrived test case) MVCC deadlock. The app is running and I can query the system_session table and get any diagnostics as is helpful.

    sql> select count(*) from INFORMATION_SCHEMA.SYSTEM_SESSIONS;
    54
    
    sql> select count(*) from INFORMATION_SCHEMA.SYSTEM_SESSIONS where this_waiting_for = '29';
    51
    
    sql> select autocommit, readonly, last_identity, transaction, transaction_size, this_waiting_for, latch_count, current_statement from INFORMATION_SCHEMA.SYSTEM_SESSIONS where session_id = 29;
    AUTOCOMMIT  READONLY  LAST_IDENTITY  TRANSACTION  TRANSACTION_SIZE  THIS_WAITING_FOR  LATCH_COUNT  CURRENT_STATEMENT
    ----------  --------  -------------  -----------  ----------------  ----------------  -----------  -----------------
    FALSE       FALSE                 1  FALSE                       0  20                          1  CHECKPOINT DEFRAG
    
    sql> select autocommit, readonly, last_identity, transaction, transaction_size, this_waiting_for, latch_count, current_statement from INFORMATION_SCHEMA.SYSTEM_SESSIONS where session_id = 20;
    AUTOCOMMIT  READONLY  LAST_IDENTITY  TRANSACTION  TRANSACTION_SIZE  THIS_WAITING_FOR  LATCH_COUNT  CURRENT_STATEMENT
    ----------  --------  -------------  -----------  ----------------  ----------------  -----------  -----------------
    FALSE       FALSE              5874  TRUE                        0                              0
    

    So, from that I see that there are 51 sessions blocked on session 29, which is attempting to run CHECKPOINT DEFRAG. Session 29 is in turn blocked by session 20. Session 20 does not appear to be blocked by anything.

    Using the value in LAST_IDENTITY, I was able to locate the corresponding record in the hsqldb log file. I took a thread dump, and I do not see a thread in the dump that corresponds with the insertion of this record. To me, this reads like that from the application's perspective the session was successfully committed and it moved on, but for some reason the book keeping got off in HSQLDB and it still thinks the session is active.

     

    Last edit: Peter Winckles 2023-08-30
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB