I have quite a weird problem with indexing about 8000 PDF's.
The files are indexed through a local_urls= setting which works perfect
(all files are found as local equivalent of the URL version) but all
files are allways changed according to htdig.
For indexing the PDF's I use an executable PHP script which uses in his
turn pdfinfo / pdftotext (both version 3.xx) and queries a database to
retrieve some additional meta info (like the correct title etc). All
gathered info is rendered into HTML which is indexed by htdig. It also
adds 3 meta items: "Last-Modified", "Date" and "DC.Date" to force the
modification date. In conjunction with the use_doc_date it should be
clear to htdig that the document was changed or not.
I can't figure out why every day the PDF's are changed (and they're not)
but I have the idea that htdig takes the filetime of the tmpfile as
Wim Kosten <wim@...>
ibuildings.nl BV - information technology
http://www.ibuildings.nl - 0118 42 95 50