Excessive heap usage in SessionContext

Help
2012-11-29
2014-01-19
  • Martin Fischer

    Martin Fischer - 2012-11-29

    We are using a very big, pure in memory (:mem protocol) hsqldb database in MVCC mode. This database has thousands of tables where only a small amount of tables (100 or so) contain the base data and thousands of tables are used to keep intermediate results that are generated step by step out of the base data in a butterfly fashion:
    b1 join b2 -> insert into i1
    b1 join b2 -> insert into i2

    i1 join i2 -> insert into i11
    i3 join i4 -> insert into i12

    The base tables are ordinary non-temp tables whereas the intermediate tables are GLOBAL TEMP tables.
    We do this kind of transformation in a single long-running transaction (running for minutes). The tx has to be kept open to keep the MVCC snapshot intact. During a transformation we create millions of tuples.

    Now the problem is: we are running out of heap (which we set to 4GByte as max). Heap dump analysis shows that most of the memory is kept in the SessionContext within the rowActionList (1.3 GByte). The more modifications to the data we make, the bigger the rowActionList gets. Since we have so many modifications in a tx, we run out of heap after some minutes.

    For me this is a bit surprising since I think the rowActionList is a kind of "dirty-list" that is used to keep the MVCC snapshot intact and is used in case of rollback (but I'm nut sure about that). What surprises me is that TEMP tables will loose their content anyway on commit/rollback and their content will never be visible to other transactions. So I would guess that there is no need to keep a "dirty-list" around. This would mean: no need to store actions done on temp tables in the sessions rowActionList.

    My question is: Is there a way to reduce memory consumption in the rowActionList in this very special setup that we use? Does it make sense to try not to add the temp table actions to the rowActionList?

    We use the 2.1.0.rc4SNAPSHOT version of sqldb.

    Regards,
    Martin Fischer

     
  • Fred Toussi

    Fred Toussi - 2012-11-29

    The rowActionList is not used to keep the MVCCC snapshot intact. This is done by the transaction manager inside the actual tables, which are multiversion. The rowActionList is for keeping track of the session's work, used for rollbacks, including rollback to savepoints.

    As for temp tables that are cleared on commit or rollback, there is still a need to keep the work in case there is a partial rollback due to an action involving multiple rows not completing, or a rollback to a savepoint.

    The heap dump analysis is misleading. The data stored in rowActionList is the current data from the tables. Even if rowActionList did not exist, the data would.

    You could try HyperXtremeSQL, which uses less memory for the same memory table data.

     
  • Martin Fischer

    Martin Fischer - 2012-11-30

    Hi Fred,

    thank you for your instant reply.
    I'm willing to give HyperXtremeSQL a try but unfortunately I was not able to find a evaluation license or so.

    Could you please clarify how much memory we could safe roughly by using HyperXtremeSQL. I know, this is hard whithout knowing the details but a rough impression would help me to convince my customer to pay for a commercial offering.

    Are there any memory consumption figures publicly available? Or a testsuite that shows the difference between hsqldb and HyperXtremeSQL regarding memory consumption and query speed for pure in-mem tables?

    Regards
    Martin

     
  • Fred Toussi

    Fred Toussi - 2012-11-30

    Yes, it is not easy to say how much memory is saved, as it depends on the data and the indexes used on each table.

    I will publish the results of a test suite in the next couple of days. However, this may not match your data type.

    I suggest you provide a sample table structure, complete with all index and constraint declarations and 10 sample row data. I can then report on this with a million or so rows.

    Another aspect of HyperXremeSQL is the fact that operations on disk based tables is very fast, allowing these tables to be used instead of memory tables at up to half the speed.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks