From: Wim K. <wi...@ib...> - 2006-03-14 09:35:52
|
Goodmorning, I have quite a weird problem with indexing about 8000 PDF's. The files are indexed through a local_urls= setting which works perfect (all files are found as local equivalent of the URL version) but all files are allways changed according to htdig. For indexing the PDF's I use an executable PHP script which uses in his turn pdfinfo / pdftotext (both version 3.xx) and queries a database to retrieve some additional meta info (like the correct title etc). All gathered info is rendered into HTML which is indexed by htdig. It also adds 3 meta items: "Last-Modified", "Date" and "DC.Date" to force the modification date. In conjunction with the use_doc_date it should be clear to htdig that the document was changed or not. I can't figure out why every day the PDF's are changed (and they're not) but I have the idea that htdig takes the filetime of the tmpfile as last-modified. Any clues? Regards, Wim -- Wim Kosten <wi...@ib...> ibuildings.nl BV - information technology http://www.ibuildings.nl - 0118 42 95 50 |