Manual recovery from .data file

  • Luca Santarelli

    Luca Santarelli - 2013-07-22


    I have a 1.8.1 database where, after a hard drive failure, I could only recover the .data file. Using the original .script file, the database starts but shows no data. The question "Can data be recovered?" has already been answered negatively by Fred in the an old topic ( claiming that it is not possible since the missing SET TABLE <name> INDEX <index value…> statements are stored only in the .script file.

    I wonder, however, if the simple structure of my database could allow for a manual/structured recovery of at least part of the data.
    It is a very simple database consisting of 3 CACHED tables, where the first two have 2 rows each and their DML is known. The third table, the one I'd like to recover, is composed of 9 columns, INTEGER, DATE, VARCHAR_IGNORECASE or DECIMAL(18,2) types. VARCHAR_IGNORECASE was created without a declared limit and, as a default setting, varchar length wasn't enforced (I guess this may make things worse).
    Opening the file in an hex editor shows a lot of (almost) readable content.

    I downloaded the old 1.8.1 source code to try and understand the data format used at the time, but it is proving a hard task without some pointers.

    I'd appreciate your thoughts about the following points:

    1. Using existing classes to try and recover data (but I'd need pointers on where to search)
    2. Starting the database and using "brute force" SET TABLE <name> INDEX <values…> SQL commands until data is found.
    3. Anything else I could be missing or giving for granted.

    Thank you for your patience. :)

  • Fred Toussi

    Fred Toussi - 2013-07-22

    The alternatives 1 and 2 might work together.

    The rows are stored with the size in bytes first as a 4-byte integer. Then each index for the row takes 4 times 4-byte integers, then the data. Each data column starts with byte 0 (null) or 1 (data follows). Each string starts with a 4-byte length.

    See the org.hsqldb.rowio package and the DiskNode class.

    Index nodes are addresses in the file. Once you have found a row, use the index parent nodes to go to the root node (this has no parent).

    You can create a very small database with 5 or 6 rows and examine the data before tackling the recovery database.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks