HyperSQL Database Engine (HSQLDB) / Bugs / #1636 DDL blocking on transaction in other connection with mvcc transaction model

Fred Toussi - 2021-07-24

status: open --> open-works-for-me

assigned_to: Fred Toussi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-07-24

With MVCC model, you need to use different threads for the two connections. In your usage, the connection that attempts to create a table waits until the connection that has done the SELECT commits.

With the default LOCKS model, CREATE TABLE is a statement that is allowed, but many other forms of statement such as DELETE or UPDATE on the same table from the same thread will cause it to hang.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-07-27

this is in a real system with thread pools and connection pools really tricky to always get right
So it really would be better if there where no deadlocks like this.

Many times we don't control threads, they are just calls from a Servlet container (rest or rcp) and the server only has a connection pool and a session pool with transactions (connections)

Why would you need to wait for stuff? if you just create or drop a table that has nothing todo with the select statement for a completely different table?
What is the reason that you need to lock?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-07-27

You have two simple independent options: (1) Start an HSQLDB server in your app and connect to it. The server will use a separate thread for each connection. (2) Attach each connection in the pool to a single thread.

In your current usage, even if no-wait CREATE TABLE were supported in MVCC, you could still get deadlocks if two transactions update the same row using the same thread.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-07-28

i can have 2 transactions on different threads also making a deadlock
thats not the point. It shouldn't matter in what thread you are, i really don't understand that
A connection should juts be a connection not tied to a thread. There are all kind of thread pools in this world and different usage how those are used over many threads in pools..

If i create a deadlock because of transactions hitting the same table, then that is my problem, but doing a create or drop table in any connection even if you have other connections (same thread or not) that have a transaction on other tables should not matter, for me especially in Multiversion concurrency control because its all in the name....

You can discuss what a lock is is it full table locking or just row locking, can't or can we read committed data even if there is already uncomitted data from another transaction.. That stuff can result in locking or no locking.

In our usecases the tables that 1 connection create or drop, will never really be read at the same time by another connection (those things are on a higher level already user separated)
But we have problems that different users lock each other out, and they dont share any tables or data within each other.

A different server is not an option for us, this is really quick single server quick creation of in memory tables. for a specific use that want to cache certain data as SQL

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-07-28

A connection must be tied to a thread because in Java we use threads to wait. There is no other mechanism.

Using a different thread for each connections, there is never a deadlock. Thread A waits for thread B to commit. If thread B then attempts to lock an object that is already locked by thread A, thread B is forced to rollback (see the hsqldb.tx_conflict_rollback property for the alternative).

You can start the server in the same JVM to serve in-memory tables. It is not a different server. You then change the connection URL to the jdbc:hsqldb:hsql:.. form.

In the MVCC model there are row-level locks or a total lock for all operations. The total lock is used for CREATE statements. A lot of implementation work is required to introduce a third kind of lock for CREATE statements and we wouldn't do it unless it is funded. In the LOCKS model there are schema-object-level locks or a total lock. Because CREATE only locks the catalog to avoid two CREATE statements running simultaneously, it can run concurrently with the locks on the tables.

Last edit: Fred Toussi 2021-07-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-07-30

the confusion here is a bit about the focus on "single thread" that is not the problem here
this also hangs:

Thread thread = new Thread(() -> {
try
{
Statement statement2 = connection2.createStatement();
statement2.execute("create table table2 (id integer, data varchar(50), primary key (id))");
statement2.close();
}
catch (SQLException e)
{
e.printStackTrace();
}
});

thread.start(); thread.join();

and thats kind of our situation if we would do it in multiply threads.
Because the 1 with the transaction wants to wait for the create table to be finished that is the flow.

And then i guess you can say, you need to use the same connection (that has the transaction) to create the table. But we don't want that because what does that mean? that the create table can be rollbacked? that's not something that can happen.

So MVCC mode we can't just use, thats not an option for us.

Problem is that we tried to fix this hang:

java.base@15.0.2/jdk.internal.misc.Unsafe.park(Native Method)
java.base@15.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
java.base@15.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:714)
java.base@15.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1046)
org.hsqldb.lib.CountUpDownLatch.await(Unknown Source)
org.hsqldb.Session.executeCompiledStatement(Unknown Source)
org.hsqldb.Session.executeDirectStatement(Unknown Source)
org.hsqldb.Session.execute(Unknown Source)
org.hsqldb.jdbc.JDBCStatement.fetchResult(Unknown Source)
org.hsqldb.jdbc.JDBCStatement.execute(Unknown Source)

that is 1 thread in the whole stackdump of the application that just hangs there.
and this is in default/normal transaction mode (not mvcc)
(but it is create table)
the above is for now not reproducible in a sample, but just in production it sometimes just happens and the java needs to be restarted to get out of this.

Reading up we thought that MVCC mode would be better, but MVCC just made it worse for us it seems.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-07-30

oh wait my last statement is wrong

the same hanging stack we get in 2 different modes
in mvcc mode we get 1 thread hanging like that when doing a create statement

but in normal/default mode we get the same hanging stack
but then for drop tables
but then many threads doing that (and they never get out of it)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-07-30

It would be helpful if you include the scenarios you mention in the complete test code.

In your new example, there shouldn't be a deadlock. In MVCC the thread with CREATE TABLE simply waits for all the thread that are doing SELECT, INSERT, DELETE, etc. until they all commit, then it executes successfully. It would do the same if you execute DROP TABLE with that thread.

I am not saying you need to use the same connection for CREATE TABLE. All DDL statements such as CREATE and DROP are transactionally isolated. In other words, when it is their turn to execute, they commit the current transaction in the connection (if there is one), then do their work and auto-commit the transaction.

Last edit: Fred Toussi 2021-07-30

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-07-30

right and we need to be able to create tables in the middle of transactions and those creates should not block
So this is not possible for MVCC mode so we can't use it.

And with default mode this is possible we don't really see blocks there but for that we really have current in none reproducible sample where drops are just all blocking and never return

Does a drop table in default lock mode also wants to have a full lock? (and create is relaxed?)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-07-30

In the LOCKS model, CREATE TABLE doesn't block, and DROP TABLE locks only the table that it is to drop and any other table that references the target table with FOREIGN KEY constraints.

You say you need to create tables in the middle of the transactions. If you are referring to the current transaction, then there is no wait in MVCC or LOCKS. The connection has done some SELECT, INSERT, etc. When it wants to execute the CREATE or DROP, it commits, then tries to execute.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-07-30

There is also another way you can fix the problem. You said "In our usecases the tables that 1 connection create or drop, will never really be read at the same time by another connection (those things are on a higher level already user separated)." In that case, you can use DECLARE LOCAL TEMPORARY TABLE in the connection, instead of CREATE TABLE. The local table is not seen by any other connection and you can use different threads to create it and run transactions.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-09-27

this is still a bit of a problem for us and i wonder if you have an idea how to fix this then completely

First what we do: we use a hsqldb connection pool to an in memory database where sessions (like an websocket or http session in tomcat) can create quickly tables and store data for their session.

Sessions never see other sessions data, we make sure that we generated tables based on uuid's (the names). We can't use " DECLARE LOCAL TEMPORARY TABLE" because i think that is based on the connection, but the connections are pooled its not specific to a session,

A session can just take a connection from the pool an create or drop tables or start a transaction to update data on its own created tables..

So what transaction mode is then nice for us? we don't want that sessions are waiting on each other for anything.. drop or create tables in 1 session should not have any effect on another sessions queries

if 1 session starts a (long) transaction other session should just ignore (never block) that, they will never touch those tables of that other session anyway, but in the mean time they could just start there one transaction (and touch there own tables) or create or drop tables of there own.

Is really the only fix for this to really use a different hsqldb database per session? so really start a full new database instance so they are really separated?

we are now just using 1 url for the Connection pool datasource like: "jdbc:hsqldb:mem:_sv_inmem"

and i guess to completely separate this we need

jdbc:hsqldb:mem:generated_uuid

but then a lot of stuff must be different because suddenly the connection pool is per session

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-09-28

You can use the default transaction model (LOCKS) because each session has its own exclusive tables. Do not drop the tables when the session is finished. You can then use a separate, dedicated session with its exclusive connection to drop the tables created by expired / closed sessions. This is broadly within your current model of connection pool and session pool.

Alternatively, using a separate, dedicated hsqldb database for each session means no connection pool is used. Each session has just one connection to its own database. You can have a pool of databases and reuse a database when its session is closed (you define a data source consisting of a database and its connection and reuse it).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-09-28

So if there is just 1 dedicated connection that does all the dropping, then there will never be a session (with a connection from a pool) that will block on others because drop is the one that wants to have an exclusive lock on the whole database?
And it can then happen that the dedicated "hangs" for a while if others are locking, having transactions, but that doesn't really matter because in the end it will go on, and sessions won't be affected?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-09-28

I think it will work that way. You need to write a test case with a small connection pool and large number of simultaneous sessions to show it performs well. Also, when a session is finished, it can TRUNCATE all its tables to reduce memory use.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-10-18

we finally figured it out what it was, it was not really the drop table statement (that did go through) but what we also did to clean it up was sending a "CHECKPOINT" because of https://sourceforge.net/p/hsqldb/bugs/1578/

and that checkpoint is problematic because that needs a full lock on the database. so if there is at least 1 (long) transaction running , that wil block the CHECKPOINT from being executed.
But if we did send it there is another problem then all other transactions that want to start after that will wait that checkpoint command to finish, so suddenly all transactions block on the first transaction, which normally wouldn't occur.. (they are very likely not touching the same data)

but with the later releases (2.5.1 and higher) we now have that automatic checkpoint when i set the FILES LOG SIZE right?

But will that in the end also try to lock the full database?

I now have code like

when we clean up 1 session (dropping its tables) -> check if there is 1 or more transactions -> if there are don't send checkpoint, else send checkpoint.

Then we just hope that we now and then get through it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-10-18

i guess i dont really need to set it right? If reading the doc it says "the default maximum is 50MB"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-10-18

It is better to set the LOG SIZE explicitly. For mem: databases this can be a lot larger than 50MB if you have more memory available.

You can also change the data types of the lob columns to LONGVARCHAR and LONGVARBINARY (or larger VARCHAR and VARBINARY types) which makes checkpoints unnecessary and performs better in mem: databases.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2021-10-18

i rather do the latter, try not to use clobs at all.
Problem is there are scenario's that we don't know up front the size (so calling varchar(1000) is not something we know.

i am looking into this:

http://hsqldb.org/doc/2.0/guide/sqlgeneral-chapt.html#sgc_char_types

now i do wonder what the difference between them all are

varchar
longvarchar
clob

i guess varchar needs a size right? so if we create

varchar(16M) thats the same as just using longvarchar without a size?
and clob(16M) is really different then longvarchar(16M) or varchar(16M) ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2021-10-18

The doc covers the types and sizes. The difference between CLOB(maxsize) and VARCHAR(maxsize) is the storage method.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DDL blocking on transaction in other connection with mvcc transaction model

Group

Searches

Help

#1636 DDL blocking on transaction in other connection with mvcc transaction model

Discussion