Mass inserts, backup & checkpoints, getFromFile out of mem

Help
Jerome
2014-06-30
2014-07-04
  • Jerome
    Jerome
    2014-06-30

    Hi
    I'm kind of stuck with an HSQLDB implementation :
    - HSQLDB 2.3.2
    - I use these parameters : hsqldb.write_delay=true;shutdown=true;hsqldb.sqllog=0;hsqldb.applog=0;hsqldb.log_data=true;hsqldb.log_size=32
    - I use only CACHED tables
    - I have constraint so I can only use 128M max for both my app and HSQLDB

    My process is : insert 600 000 records in a table, then process them. The insert is going well, but right after I get this exception when I want to make a COUNT request on the table :

    2014-06-30T15:02:26.551+0200 SEVERE XXXXX getFromFile out of mem 3835038
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
    at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
    at org.hsqldb.rowio.RowInputBinaryDecode.readData(Unknown Source)
    at org.hsqldb.RowAVLDisk.<init>(Unknown Source)
    at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
    at org.hsqldb.persist.DataFileCache.getFromFile(Unknown Source)
    at org.hsqldb.persist.DataFileCache.get(Unknown Source)
    at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
    at org.hsqldb.index.NodeAVLDisk.findNode(Unknown Source)
    at org.hsqldb.index.NodeAVLDisk.getLeft(Unknown Source)
    at org.hsqldb.index.IndexAVL.next(Unknown Source)
    at org.hsqldb.index.IndexAVL.next(Unknown Source)
    at org.hsqldb.index.IndexAVL$IndexRowIterator.getNextRow(Unknown Source)
    at org.hsqldb.RangeVariable$RangeIteratorMain.findNext(Unknown Source)
    at org.hsqldb.RangeVariable$RangeIteratorMain.next(Unknown Source)
    at org.hsqldb.QuerySpecification.buildResult(Unknown Source)
    at org.hsqldb.QuerySpecification.getSingleResult(Unknown Source)
    at org.hsqldb.QuerySpecification.getResult(Unknown Source)
    at org.hsqldb.TableDerived.materialise(Unknown Source)
    at org.hsqldb.StatementDMQL.materializeSubQueries(Unknown Source)
    at org.hsqldb.StatementDMQL.execute(Unknown Source)
    at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
    at org.hsqldb.Session.executeDirectStatement(Unknown Source)
    at org.hsqldb.Session.execute(Unknown Source)
    at org.hsqldb.jdbc.JDBCStatement.fetchResult(Unknown Source)
    at org.hsqldb.jdbc.JDBCStatement.executeQuery(Unknown Source)
    at org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
    at org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
    at org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
    at org.springframework.jdbc.core.JdbcTemplate$1QueryStatementCallback.doInStatement(JdbcTemplate.java:452)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:402)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:468)

    I tried to disable logs while insert and I got the same.

    Other thing : I don't want the checkpoint to occur every 32M of log_data. I only want data to be commited in DATA file. Is there any way to disable CHECKPOINT AND keep log feature ? It really slows down my process to copy logs into data, AND data into backup.

    Thanks for advice

     
    Last edit: Jerome 2014-06-30
  • Fred Toussi
    Fred Toussi
    2014-06-30

    You need to reduce the CACHE SIZE parameter to reduce the heap memory usage.

    If you do not want the checkpoint to accur every 32M, then change the hsqldb.log_size to zero. In this case there is a log that grows very large but no checkpoint occurs until you perform a CHECKPOINT in your app.

     
    • Jerome
      Jerome
      2014-07-01

      Thank you. The log size to zero fits perfectly my need.
      But... still memory problems. It's mostly I think because of my SQL that makes HSQLDB put a lot of things in RAM as I can see :

      SELECT fileHistoryID, max(version)
      FROM pendingfileversions
      GROUP BY fileHistoryID

      On a 650k records table, retrieving first 10 records starting from this statement requires 200M in JVM to properly execute, and takes about 70 seconds.

      I have indexes both on fileHistoryID and version.

      Do I have too much records and reached HSQLDB limits ?

      If you want to try out, I uploaded the DB here : https://docs.google.com/file/d/0BwKccXgA9ZZWelBnWlFwTkhPa2M/edit

      SA / No password

      Thanks

       
  • Fred Toussi
    Fred Toussi
    2014-07-01

    I will try out the selects on your database. But you are selecting from the whole table and the SELECT has to read all the records. You need more memory to perform this kind of query.

     
  • Jerome
    Jerome
    2014-07-02

    OK, let me know if you find something that can reduce load required by this query.
    I'm going to review the model of my db to precompute some data, but I would have preferred to use my initial design.

     
  • Jerome
    Jerome
    2014-07-02

    and thanks for support :) hsqldb is extremely fast compared to other engines I tried. But aggregates / subqueries are not HSQLDB best friends :)

     
  • Fred Toussi
    Fred Toussi
    2014-07-02

    You can change the index on filesHistoryID to a two column index (fileHistoryID, version). Use ALTER INDEX for this. You may need to add a dummy condition such as (fileHistoryID > -1) to force the use of this index. Use EXPLAIN PLAN to see if it works.

    For reduced memory use, you can perform the SELECT in chunks with different ranges of fileHistoryID.

     
    Last edit: Fred Toussi 2014-07-02
  • Jerome
    Jerome
    2014-07-04

    I finally updated my data model. I couldn't get performance and memory under control without doing that. Anyway, put log_size to zero and interrupt JVM causes most the time HSQLDB not able to repair its DB at next application start. A log file of 400MB is very hard to manage for HSQLDB. I choose so to maintain my log_size to 32MB, even if it causes significant performance loss when a checkpoint is occuring.

    About checkpoints : I noticed the .backup file is each time entirely rebuilt (the file has a 0 size at start, then it grows up). And I red in documentation that the inc_backup parameter is set to true by default. Is it normal ? Why does not HSQL perform real inc backups ?

    Thanks again

     
  • Fred Toussi
    Fred Toussi
    2014-07-04

    You can keep the hsqldb.log_size=0. You just need to perform SHUTDOWN just before you close your application. The log file should be needed only if there is a machine crash, not as routine.

    The checkpoint behavior with inc_backup is normal and faster than non-incremental backup. The .backup file is deleted at the end of each checkpiont.

    You can switch to HyperXtremeSQL if you need better performance.