Hi Kevin,

I'm sorry to hear you've run into some corruption issue with JDBM.

To determine if the software is at fault, I would first look for cases where JDBM could overwrite existing structures beyond the boundary of a block, similar to so-called "buffer overflows" in C code.  I can't think of specific code that would do this off the top of my head, but that's the first thing I'd check.

Otherwise, JDBM doesn't have any fail-fast protection mechanism, like the checksum idea you were mentioning, so it's always possible that a little corruption can go unnoticed and cause bigger damage.  The transaction mechanism helps contain the damage to some degree but if you're running a lot of small transaction, this degenerate case is real.  If you have the heart for it, looking at the actual binary file might yield some clues about the corruption.  You might see traces of overflow, or perhaps random pattern that would indicate hardware failure.  Also you should consider running some disk surface analysis test overnight (most hard disk manufacturers provide this kind of software) to verify the hardware side of things.

Keep us informed if you find any meaningful clue, I'm sure everybody on the list is both curious and interested to know the outcome of your sleuthing.

Good luck,
alex


On 5/15/07, Kevin Day <kevin@trumpetinc.com> wrote:
Hi folks -
 
I'm seeing some odd behavior at a site, and wanted to get your opinion on things.
 
We first noticed a problem when we found some potential corrutpion in one of our own record structures stored in a jdbm data store - we were winding up with some recids that were way out of line with what would be expected.  During further debugging, I actually encountered the following exception trace while walking a BTree:
 

java.io.StreamCorruptedException

: invalid stream header

at java.io.ObjectInputStream.readStreamHeader(Unknown Source)

at java.io.ObjectInputStream.<init>(Unknown Source)

at jdbm.btree.BPage.deserialize(

BPage.java:1021)

at jdbm.recman.BaseRecordManager.fetch(

BaseRecordManager.java:357)

at jdbm.recman.CacheRecordManager.fetch(

CacheRecordManager.java:263)

at jdbm.btree.BPage.loadBPage(

BPage.java:868)

at jdbm.btree.BPage.access$0(

BPage.java:865)

at jdbm.btree.BPage$Browser.getNext(

BPage.java:1232)
 
 
From what I can see, it appears that the content of the db file is actually getting scrambled (definitely not good).
 
Can any of you think of a way that this could happen outside of failure of the disk media?  The logging system should prevent application crashes from corrupting the db file, and I can't think of anything that our application could do that could cause a stream corruption like I'm seeing above.  But I want to check with everyone else to see if there might be some potential software vectors for this before we start digging into the hardware nastys.
 
 
I'm wondering if it might be a good idea to include a crc32 or adler32 checksum in our page writes - the cost of computation would be pretty small, and it would be nice to be able to detect at the page level whether corruption has occured...
 
Thanks much for any thoughts/comments,
 
- Kevin