Hi Kevin,
I'm sorry to hear you've run into some corruption issue with JDBM.
To determine if the software is at fault, I would first look for cases where
JDBM could overwrite existing structures beyond the boundary of a block,
similar to so-called "buffer overflows" in C code. I can't think of
specific code that would do this off the top of my head, but that's the
first thing I'd check.
Otherwise, JDBM doesn't have any fail-fast protection mechanism, like the
checksum idea you were mentioning, so it's always possible that a little
corruption can go unnoticed and cause bigger damage. The transaction
mechanism helps contain the damage to some degree but if you're running a
lot of small transaction, this degenerate case is real. If you have the
heart for it, looking at the actual binary file might yield some clues about
the corruption. You might see traces of overflow, or perhaps random pattern
that would indicate hardware failure. Also you should consider running some
disk surface analysis test overnight (most hard disk manufacturers provide
this kind of software) to verify the hardware side of things.
Keep us informed if you find any meaningful clue, I'm sure everybody on the
list is both curious and interested to know the outcome of your sleuthing.
Good luck,
alex
On 5/15/07, Kevin Day <kevin@...> wrote:
>
> Hi folks -
>
> I'm seeing some odd behavior at a site, and wanted to get your opinion on
> things.
>
> We first noticed a problem when we found some potential corrutpion in one
> of our own record structures stored in a jdbm data store - we were winding
> up with some recids that were way out of line with what would be expected.
> During further debugging, I actually encountered the following exception
> trace while walking a BTree:
>
> *
>
> java.io.StreamCorruptedException
> *
> : invalid stream header
>
> at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
>
> at java.io.ObjectInputStream.<init>(Unknown Source)
>
> at jdbm.btree.BPage.deserialize(
> *BPage.java:1021*)
>
> at jdbm.recman.BaseRecordManager.fetch(
> *BaseRecordManager.java:357*)
>
> at jdbm.recman.CacheRecordManager.fetch(
> *CacheRecordManager.java:263*)
>
> at jdbm.btree.BPage.loadBPage(
> *BPage.java:868*)
>
> at jdbm.btree.BPage.access$0(
> *BPage.java:865*)
>
> at jdbm.btree.BPage$Browser.getNext(
> *BPage.java:1232*)
>
>
> From what I can see, it appears that the content of the db file is
> actually getting scrambled (definitely not good).
>
> Can any of you think of a way that this could happen outside of failure of
> the disk media? The logging system should prevent application crashes from
> corrupting the db file, and I can't think of anything that our application
> could do that could cause a stream corruption like I'm seeing above. But I
> want to check with everyone else to see if there might be some potential
> software vectors for this before we start digging into the hardware nastys.
>
>
> I'm wondering if it might be a good idea to include a crc32 or adler32
> checksum in our page writes - the cost of computation would be pretty small,
> and it would be nice to be able to detect at the page level whether
> corruption has occured...
>
> Thanks much for any thoughts/comments,
>
> - Kevin
>
>
|