From: Geoff H. <ghu...@ws...> - 2001-12-01 23:57:18
|
At 9:46 PM +0100 12/1/01, Ralph Ballier wrote: >I have a big problem using htdig-3.2.0b3. First off, I'd suggest grabbing a snapshot of 3.2.0b4 not the least because it includes a variety of bugfixes and an important security fix. >Last night there was a big trouble: I got a big mail (2 GByte !!!), >containing 37.006.820 lines(!!!) with the same content: > WordKey::Compare: key length for a or b < info.num_length I don't know how big your max_doc_size attribute is set, but one reason for this attribute is to prevent problems arising from "mail bombing" and the like. In your case, I wonder how large your databases are. Remember that on some operating systems (Linux on Intel in particular), files are limited to 2GB in size. So if your word database gets larger than this size, there's little htdig can do when indexing--it gets strange error messages back from the OS because the file is too large and things come to a halt. You may also have somewhat corrupted databases from your earlier problems. One thing you can do with 3.2 betas is to run the "htdump" program to write the databases out to ASCII text files, then you can delete the binary files (db.words.db, db.docs.index db.docdb and db.excerpts) and run the "htload" program to rebuild the files from the text archives. (Please note that the ASCII files are almost always significantly larger than the old databases--so if your databases are large and you face the 2GB limit, this won't help. Or if you don't have free diskspace, this will also not help.) -- -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |