HyperSQL Database Engine (HSQLDB) / Bugs / #1578 using clob as a column type will result in a out of mem (or ever growing file)

#1578 using clob as a column type will result in a out of mem (or ever growing file)

Milestone: version 2.5.x

Status: open-fixed

Owner: Fred Toussi

Labels: None

Priority: 5

Updated: 2020-06-23

Created: 2020-04-16

Creator: Johan Compagner

Private: No

Sample code below.

Problem is if you are using "clob" as a column then if you insert a clob then that will be at once 32kb of memory or disk usage (depending on mem: or file:)

by itself not a big problem (thats why we do use clobs right? but it would be nice if they only took the amount of memory that they really are, with a few bytes of meta data like length)

But the problem is if you delete the row with the clob, or even drop the whole table, then nothing is clean up, the clob manager (File or Mem based) just keeps growing

If you are using for example mem based and you use hsqldb really as a caching database (so you create temp tables, use that in your app, then drop the table complete) then the memory will grow and grow and you will get an out of mem in the end...

This can be "worked" around by using then the file: so that at least the file will grow and grow.

The test below is based on file because then it is easier to be seen. But mem works the same way (only file is even persistent after restart so you see the file grow and grow even over restarts when there is really no table at all in memory at first start)

package test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class Main {
public static void main(String[] args) throws ClassNotFoundException, SQLException {
Class.forName("org.hsqldb.jdbcDriver");// to register

    // if you run this program a few times without deleting the file in this dir specified here, then the clobs file will grow and grow and grow
    Connection connection = DriverManager.getConnection("jdbc:hsqldb:file:D:\\temp\\hsql2\\hsqldb", "SA", "");

    String create = "create table atable  (rowid integer generated by default as identity (start with 1), reference_number clob , primary key (rowid))";
    Statement createStatement = connection.createStatement();
    createStatement.execute(create);
    createStatement.close();

    String insert = "insert into atable (reference_number) values (?)";

    int p = 0;
    // loop a few times to fill and delete it all from a table. if you make this just 1 then a 3200kb file is created 
    // but with 10 that file is 10 times as big, don't expect that, a delete should just clean stuff up.
    while (p++ < 10) {
        connection.setAutoCommit(false);
        PreparedStatement insertStatement = connection.prepareStatement(insert);
        int i = 0;
        while (i++ < 100) {
            insertStatement.setString(1, "00057434.5" + Math.random());
            insertStatement.execute();
        }
        connection.setAutoCommit(true);
        insertStatement.close();
        Statement delete = connection.createStatement();
        // this should delete the clobs
        delete.execute("delete from atable");
        delete.close();
    }

    Statement select = connection.createStatement();
    ResultSet executeQuery = select.executeQuery("select * from atable");
    //  this prints false, there is really no more data. but the file is 32000kb big
    System.err.println(executeQuery.next());
    executeQuery.close();
    select.close();

    Statement delete = connection.createStatement();
    // even after dropping the whole table nothing is clean up.
    delete.execute("drop table atable");

    connection.close();
}

}

Discussion

Fred Toussi - 2020-04-16

Thanks for the test case.

There are configuration and maintenance settings that deal with the issues you mentioned. You can set the LOB block size to as little as 1kb. If you execute a CHECKPOINT either directly or automatically, the storage for deleted LOBs is cleared.

I suggest you apply these to your test case and report it again if the memory or disk is not reused.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2020-04-20

lowering the block size is not really a solution, because then in the end you will have the same problem...

But yes executing a "CHECKPOINT" will result in the lobs being cleared (at least what i see on disk)
But why would i do that manually?

So the default behavior that we use where we create tables (could be temp) where we insert lobs and delete then rows again, maybe even delete the whole table (if temp tables) the memory will grow and grow?
Why is hsqldb not doing what CHECKPOINT does now and then? (maybe after a drop of a table that has a (c)lob column, or delete of rows in a table that has a (c)lob column)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2020-04-20

The CHECKPOINT is performed automatically when the .log file is reset. The frequencey is controlled with the SET LOG SIZE statement.

There are scenarios where auto checkpoint does not occur. Lob creation in TEMP tables is one, use of lobs in mem: databases is another.

We could automate the mem: database removal of lobs but for TEMP tables, you must add the CHECKPOINT to your processes.

Last edit: Fred Toussi 2020-04-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2020-04-20

As this topic has been raised before, I should add some technical details for future reference.

The lob store maintains a usage count for each lob. At checkpoint, lobs with usage count 0 are removed from the store and the space is deallocated. If, before the checkpoint, there is an abnormal termination and the redo log (the .log file) is replayed, the data for the lobs created or deleted in the .log file is already in the .lobs file and only the metadata is recreated for those lobs or empty spaces. Therefore so long as the .log file exists, the disk space for those lobs cannot be reclaimed. Once the .log file is reset (deleted), SQL statements are executed to clear the metadata for deleted lobs and the disk space becomes available for reuse.

A checkpoint is a relatively time-consuming process as it has to fsync the .data file and rewite the .script file. Therefore it is not performed very frequently

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johan Compagner - 2020-04-20

we use mem: for the most part, all of those are temp tables (but temp in our sense, i think for hsql its just plain tables, that we create or drop for a specific (http)session)
so HSQL is used as some kind of cache for us, we don't really need or want file: persistence

problem now is if one of thoe tables are using "clob" it will go out of mem in a certain point of time.

So for us doing "CHECKPOINT" is not about file size, log files or sync to the disk in anyway
its plainly for us to release memory..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2020-04-20

As I said, we could automate removal of lobs in the mem: database. Your use case shows this is a useful enhancement and I will try to include it in the next release.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2020-06-23

status: open --> open-fixed

assigned_to: Fred Toussi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fred Toussi - 2020-06-23

Support has been added in version 2.5.1 for automatic checkpoints in mem: databases. You need to execute SET FILES LOG SIZE with a non-zero value. When the total size of deleted lobs exceeds the size (in megabytes) a checkpoint takes place and the memory is released. This applies to mem: databases only.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

using clob as a column type will result in a out of mem (or ever growing file)

Group

Searches

Help

#1578 using clob as a column type will result in a out of mem (or ever growing file)

Discussion