On Wed, 10 Mar 2004, Anton Donner wrote:
> as start_url. But now it seems that I have found a magical 2GByte limit,
> because indexing (a htdig run) stops as soon as db.docdb reaches a size
> of 2147483647 (2^31 - 1) bytes. I can see in the log-files (htdig -vv)
> that htdig simply stops and does not process the remaining PDFs.
As I recall, the limit in the 3.1.x branch of the code has to do with the
associated Berkeley DB code. You of course also need the proper support at
the system library level, but it sounds like you already have that part
covered. While the BDB code did include options for building a large-file
version, I think the conclusion the last time this was investigated was
that this support either wasn't available or didn't work correctly under
Linux; it was, at least in theory available for some other platforms if
you configured the build correctly.
The newer version of the BDB code in the 3.2.x branch should support large
files under Linux, however 3.2 is still in beta and has other issues that
might be a showstopper for very large document collections.