Mass inserts, backup & checkpoints, getFromFile out of mem

  • Jerome

    Jerome - 2014-06-30

    I'm kind of stuck with an HSQLDB implementation :
    - HSQLDB 2.3.2
    - I use these parameters : hsqldb.write_delay=true;shutdown=true;hsqldb.sqllog=0;hsqldb.applog=0;hsqldb.log_data=true;hsqldb.log_size=32
    - I use only CACHED tables
    - I have constraint so I can only use 128M max for both my app and HSQLDB

    My process is : insert 600 000 records in a table, then process them. The insert is going well, but right after I get this exception when I want to make a COUNT request on the table :

    2014-06-30T15:02:26.551+0200 SEVERE XXXXX getFromFile out of mem 3835038
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
    at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
    at org.hsqldb.rowio.RowInputBinaryDecode.readData(Unknown Source)
    at org.hsqldb.RowAVLDisk.<init>(Unknown Source)
    at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
    at org.hsqldb.persist.DataFileCache.getFromFile(Unknown Source)
    at org.hsqldb.persist.DataFileCache.get(Unknown Source)
    at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
    at org.hsqldb.index.NodeAVLDisk.findNode(Unknown Source)
    at org.hsqldb.index.NodeAVLDisk.getLeft(Unknown Source)
    at Source)
    at Source)
    at org.hsqldb.index.IndexAVL$IndexRowIterator.getNextRow(Unknown Source)
    at org.hsqldb.RangeVariable$RangeIteratorMain.findNext(Unknown Source)
    at org.hsqldb.RangeVariable$ Source)
    at org.hsqldb.QuerySpecification.buildResult(Unknown Source)
    at org.hsqldb.QuerySpecification.getSingleResult(Unknown Source)
    at org.hsqldb.QuerySpecification.getResult(Unknown Source)
    at org.hsqldb.TableDerived.materialise(Unknown Source)
    at org.hsqldb.StatementDMQL.materializeSubQueries(Unknown Source)
    at org.hsqldb.StatementDMQL.execute(Unknown Source)
    at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
    at org.hsqldb.Session.executeDirectStatement(Unknown Source)
    at org.hsqldb.Session.execute(Unknown Source)
    at org.hsqldb.jdbc.JDBCStatement.fetchResult(Unknown Source)
    at org.hsqldb.jdbc.JDBCStatement.executeQuery(Unknown Source)
    at org.apache.commons.dbcp.DelegatingStatement.executeQuery(
    at org.apache.commons.dbcp.DelegatingStatement.executeQuery(
    at org.apache.commons.dbcp.DelegatingStatement.executeQuery(
    at org.springframework.jdbc.core.JdbcTemplate$1QueryStatementCallback.doInStatement(
    at org.springframework.jdbc.core.JdbcTemplate.execute(
    at org.springframework.jdbc.core.JdbcTemplate.query(

    I tried to disable logs while insert and I got the same.

    Other thing : I don't want the checkpoint to occur every 32M of log_data. I only want data to be commited in DATA file. Is there any way to disable CHECKPOINT AND keep log feature ? It really slows down my process to copy logs into data, AND data into backup.

    Thanks for advice

    Last edit: Jerome 2014-06-30
  • Fred Toussi

    Fred Toussi - 2014-06-30

    You need to reduce the CACHE SIZE parameter to reduce the heap memory usage.

    If you do not want the checkpoint to accur every 32M, then change the hsqldb.log_size to zero. In this case there is a log that grows very large but no checkpoint occurs until you perform a CHECKPOINT in your app.

    • Jerome

      Jerome - 2014-07-01

      Thank you. The log size to zero fits perfectly my need.
      But... still memory problems. It's mostly I think because of my SQL that makes HSQLDB put a lot of things in RAM as I can see :

      SELECT fileHistoryID, max(version)
      FROM pendingfileversions
      GROUP BY fileHistoryID

      On a 650k records table, retrieving first 10 records starting from this statement requires 200M in JVM to properly execute, and takes about 70 seconds.

      I have indexes both on fileHistoryID and version.

      Do I have too much records and reached HSQLDB limits ?

      If you want to try out, I uploaded the DB here :

      SA / No password


  • Fred Toussi

    Fred Toussi - 2014-07-01

    I will try out the selects on your database. But you are selecting from the whole table and the SELECT has to read all the records. You need more memory to perform this kind of query.

  • Jerome

    Jerome - 2014-07-02

    OK, let me know if you find something that can reduce load required by this query.
    I'm going to review the model of my db to precompute some data, but I would have preferred to use my initial design.

  • Jerome

    Jerome - 2014-07-02

    and thanks for support :) hsqldb is extremely fast compared to other engines I tried. But aggregates / subqueries are not HSQLDB best friends :)

  • Fred Toussi

    Fred Toussi - 2014-07-02

    You can change the index on filesHistoryID to a two column index (fileHistoryID, version). Use ALTER INDEX for this. You may need to add a dummy condition such as (fileHistoryID > -1) to force the use of this index. Use EXPLAIN PLAN to see if it works.

    For reduced memory use, you can perform the SELECT in chunks with different ranges of fileHistoryID.

    Last edit: Fred Toussi 2014-07-02
  • Jerome

    Jerome - 2014-07-04

    I finally updated my data model. I couldn't get performance and memory under control without doing that. Anyway, put log_size to zero and interrupt JVM causes most the time HSQLDB not able to repair its DB at next application start. A log file of 400MB is very hard to manage for HSQLDB. I choose so to maintain my log_size to 32MB, even if it causes significant performance loss when a checkpoint is occuring.

    About checkpoints : I noticed the .backup file is each time entirely rebuilt (the file has a 0 size at start, then it grows up). And I red in documentation that the inc_backup parameter is set to true by default. Is it normal ? Why does not HSQL perform real inc backups ?

    Thanks again

  • Fred Toussi

    Fred Toussi - 2014-07-04

    You can keep the hsqldb.log_size=0. You just need to perform SHUTDOWN just before you close your application. The log file should be needed only if there is a machine crash, not as routine.

    The checkpoint behavior with inc_backup is normal and faster than non-incremental backup. The .backup file is deleted at the end of each checkpiont.

    You can switch to HyperXtremeSQL if you need better performance.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks