[mdb-dev] mdb_map_find_next returning wrong pages

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I found a problem with mdb-export SEGFAULTing on a big table. This is JET3 engine.

CREATE TABLE "DDLogDr"
 (
    "BatchId"           INTEGER,
    "Date"          TIMESTAMP WITHOUT TIME ZONE,
    "iTransactionID"            INTEGER,
    "iConstituentID"            INTEGER,
    "cAmount"           NUMERIC(15,2),
    "dDate"         TIMESTAMP WITHOUT TIME ZONE,
    "sAuthorisation"            VARCHAR (50),
    "iAuthorisationStatusID"            INTEGER,
    "sBankAccountName"          VARCHAR (50),
    "sBankAccountNumber"            VARCHAR (30),
    "iAppealID"         INTEGER
);

Here's the end of output:
Row 0 bytes 8 to 2047  [delflag]
Row 1 bytes 2376 to 7 [lookup] [delflag]
Row 2 bytes 5662 to 2375

Program received signal SIGSEGV, Segmentation fault.
0xb7fdae99 in mdb_crack_row3 (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:121
121         var_col_offsets[i] = mdb->pg_buf[col_ptr-i]+(jumps_used*256);
(gdb) info program
    Using the running image of child process 17636.
Program stopped at 0xb7fdae99.
It stopped with signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0xb7fdae99 in mdb_crack_row3 (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:121
#1  mdb_crack_row (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:185
#2  0xb7fd6595 in mdb_read_row (table=0x805f300, row=2) at data.c:277
#3  0xb7fd689a in mdb_fetch_row (table=0x805f300) at data.c:411
#4  0x080492d2 in main (argc=3, argv=0xbffff3a4) at mdb-export.c:193
(gdb) info locals
i = 134533134
col_ptr = <value optimized out>
num_jumps = 16777202
jumps_used = 295
(gdb) up
#1  mdb_crack_row (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:185
185             mdb_crack_row3(mdb, row_start, row_end, bitmask_sz,
(gdb) info locals
col = <value optimized out>
mdb = 0x804c800
row_var_cols = 0
row_cols = 255
nullmask = 0x804d138 ""
bitmask_sz = 32
fixed_cols_found = 134531072
row_fixed_cols = 2342
col_count_size = 1
i = <value optimized out>

Defining SLOW_READ in src/libmdb/data.c fixes the problem.
So I traced back the problem to a incorrect page being returned by mdb_map_find_next.

I noticed that the slow version is testing if the page is correct with byte #0 if the page being 1 (MDB_PAGE_DATA) and bytes #4..7 matching entry->table_pg.
So I simply added that test in the not-slow version. This works ok. I get the good record count (4,170,473) in my problematic case.

I was a bit confused by the fact that function mdb_map_find_next returns -1 on failure, but that the return type was guint32.
This results in the test in mdb_read_next_dpg that produce "Warning: defaulting to brute force read" never being called.

I order to fix that, I had to change the type returned by mdb_map_find_next from guint32 to gint32. This implies another change in mdb_map_find_next_freepage() and in src/util/prfreemap.c.

Attached is a patch, based on "my" debian-based version. http://nirgal.com/mdbtools
Hopefully, it should also applies on various trunk/master out there...