From: Nirgal V. <con...@ni...> - 2010-10-12 17:12:13
|
I found a problem with mdb-export SEGFAULTing on a big table. This is JET3 engine. CREATE TABLE "DDLogDr" ( "BatchId" INTEGER, "Date" TIMESTAMP WITHOUT TIME ZONE, "iTransactionID" INTEGER, "iConstituentID" INTEGER, "cAmount" NUMERIC(15,2), "dDate" TIMESTAMP WITHOUT TIME ZONE, "sAuthorisation" VARCHAR (50), "iAuthorisationStatusID" INTEGER, "sBankAccountName" VARCHAR (50), "sBankAccountNumber" VARCHAR (30), "iAppealID" INTEGER ); Here's the end of output: Row 0 bytes 8 to 2047 [delflag] Row 1 bytes 2376 to 7 [lookup] [delflag] Row 2 bytes 5662 to 2375 Program received signal SIGSEGV, Segmentation fault. 0xb7fdae99 in mdb_crack_row3 (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:121 121 var_col_offsets[i] = mdb->pg_buf[col_ptr-i]+(jumps_used*256); (gdb) info program Using the running image of child process 17636. Program stopped at 0xb7fdae99. It stopped with signal SIGSEGV, Segmentation fault. (gdb) bt #0 0xb7fdae99 in mdb_crack_row3 (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:121 #1 mdb_crack_row (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:185 #2 0xb7fd6595 in mdb_read_row (table=0x805f300, row=2) at data.c:277 #3 0xb7fd689a in mdb_fetch_row (table=0x805f300) at data.c:411 #4 0x080492d2 in main (argc=3, argv=0xbffff3a4) at mdb-export.c:193 (gdb) info locals i = 134533134 col_ptr = <value optimized out> num_jumps = 16777202 jumps_used = 295 (gdb) up #1 mdb_crack_row (table=0x805f300, row_start=5662, row_end=2375, fields=0xbfffd9f8) at write.c:185 185 mdb_crack_row3(mdb, row_start, row_end, bitmask_sz, (gdb) info locals col = <value optimized out> mdb = 0x804c800 row_var_cols = 0 row_cols = 255 nullmask = 0x804d138 "" bitmask_sz = 32 fixed_cols_found = 134531072 row_fixed_cols = 2342 col_count_size = 1 i = <value optimized out> Defining SLOW_READ in src/libmdb/data.c fixes the problem. So I traced back the problem to a incorrect page being returned by mdb_map_find_next. I noticed that the slow version is testing if the page is correct with byte #0 if the page being 1 (MDB_PAGE_DATA) and bytes #4..7 matching entry->table_pg. So I simply added that test in the not-slow version. This works ok. I get the good record count (4,170,473) in my problematic case. I was a bit confused by the fact that function mdb_map_find_next returns -1 on failure, but that the return type was guint32. This results in the test in mdb_read_next_dpg that produce "Warning: defaulting to brute force read" never being called. I order to fix that, I had to change the type returned by mdb_map_find_next from guint32 to gint32. This implies another change in mdb_map_find_next_freepage() and in src/util/prfreemap.c. Attached is a patch, based on "my" debian-based version. http://nirgal.com/mdbtools Hopefully, it should also applies on various trunk/master out there... |