From: Jim C. <li...@yg...> - 2003-02-19 22:35:10
|
On Wednesday, February 19, 2003, at 11:57 AM, Abbie Greene wrote: > I just tried running HTDig 3.2.0b4 on a new set of files and received=20= > what seemed like millions of the following error message: > > =A0 > > WordKey::Compare: key length for a or b < info.num_length If you search the mailing list archives for 'wordkey compare', you will=20= find that this problem has been reported in the past. However I am not=20= aware of any final resolution regarding the problem. First, you might=20 want to just delete your current databases and start indexing from=20 scratch. Past reports seem to indicate that this problem is often=20 intermittent. Reindexing will also cover the possibility that the=20 databases were somehow corrupted due to something external to ht://Dig.=20= If the problem is repeatable, that might be of interest to some of the=20= developers. Aside from reindexing, you might want to ensure that you have the most=20= recent snapshot (or at least a relatively recent one). If you are=20 encountering this problem repeatedly with a recent snapshot, that would=20= also likely be of interest. > HTdig was working perfectly yesterday for me, however it was also on a=20= > set of 4000 files=85I now have 26000 files it=92s running against.=A0I=20= > realize this is MOST likely the problem as I=92ve The number of documents you are dealing with should not itself be a=20 problem. ht://Dig is regularly used for much larger collections with=20 nothing particularly exotic in the way of hardware. > reached the outer limits of this system.=A0Now my question is=20 > this=85=A0Would it make a difference to set up separate databases to=20= > reduce the size of each database (however each database would still be=20= > an index of at least 5000 files).=A0I=92m just curious how other = people of=20 > dealt with this issue in the past. Unless it is advantageous for you to build multiple databases for=20 organizational purposes, a single database is most likely sufficient=20 for the amount of data you appear to be dealing with. Jim |