|
From: Neal R. <ne...@ri...> - 2002-05-09 05:51:58
|
So here's the fix for the "file size not multiple of pagesize" that crops up in cygwin binaries. This is deep in the Berkeley DB code in db/os_rw.c PROBLEM: [CDB___os_openhandle] name=[c:/htdig/demo.db/db.words.db] mode=[0x81b6] fhp->fd=[6] [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[0] [CDB___os_write] fhp->fd=[6], len=[8192] [CDB___os_write] 8192 bytes written, wanted to write 8192 [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[8192] [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[8192] [CDB___os_write] fhp->fd=[6], len=[8192] [CDB___os_write] 8192 bytes written, wanted to write 8192 [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[16489] Notice the 0x8000 (_O_BINARY) flag is set in 'mode'. Notice that the first write of 8K (the first page in the DB) is fine. BUT WAIT! Look at the second write.. we write 8K starting at the 8K offset and finish up at 16K + 105 extra bytes! So here's the HACK: Put this on line 124 of os_rw.c setmode(fhp->fd, 0x8000) FIXED: [CDB___os_openhandle] name=[c:/htdig/demo.db/db.words.db] mode=[0x81b6] fhp->fd=[6] [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[0] [CDB___os_write] fhp->fd=[6], len=[8192] [CDB___os_write] current mode=[0x8000] [CDB___os_write] 8192 bytes written, wanted to write 8192 [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[8192] [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[8192] [CDB___os_write] fhp->fd=[6], len=[8192] [CDB___os_write] current mode=[0x8000] [CDB___os_write] 8192 bytes written, wanted to write 8192 [CDB___os_write] lseek(fhp->fd, 0, SEEK_CUR)=[16384] We finish at the correct 16K offset after the second write. setmode() returns the previous mode. In the fixed run the return value of setmode() turns up at one point as 0x4000 (_O_TEXT). So somewhere the file is opened, probably for reading as _O_TEXT. This causes NL to be translated to NL-CR during a subsequent write. For good measure we also put a setmode(fhp->fd, 0x8000) on line 94 of os_rw.c (CDB___os_read) This prevents reverse NL-CR -> NL translations on read. This fixed the problem, and we have a working native windows htdig & htsearch. This should also fix the problem for cygwin binaries. This is definetly a HACK, but we'll track down the mode change soon. It's probably in the HtDig code somewhere.. we noticed that the db.words.db file is written, closed, opened for read, and rewritten during the course of htdig.exe spidering a single web-page. Basically, the BDB files should never be opened in TEXT mode. Anyway.. I should be submitting a separate set of xx_config.h, code patches, & WIN32 makefiles by early next week. And hopefully a makefile to build a DLL version of libhtdig.so that will be callable from WIN32 programs. Thanks. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |