From: Jim <li...@yg...> - 2005-02-24 20:52:59
|
On Fri, 18 Feb 2005, Janine Sisk wrote: > On Feb 17, 2005, at 8:58 PM, Jim wrote: > >> On Thu, 17 Feb 2005, Chuck Phillips (Console, Inc.) wrote: >> >>> I expected that enabling use_doc_date would make my modified rundig (no >>> -i, no -a) only update the index for pages that have newer meta dates. >> >> I don't think that use_doc_date is intended to be used in this way. > > So then what is the "correct" way to do this? I have a site that takes about > 30 hours to fully index, so obviously I'd like to just do an update most of > the time, but it sounds like this isn't going to be as easy as dropping the > -i and -a. I've run that a couple of times on a subset of my pages and it > looks like they are all being processed each time. The technique htdig uses to determine whether a document needs to be reindexed involves the 'If-Modified-Since:' header. If the server you are contacting respects this header and returns correct last-modified dates when documents are retrieved, then dropping the -i option should prevent unmodified documents from being reindexed. If the server is not configured to meet these conditions, or the pages are dynamically generated in a manner that results in there being no associated date of last-modification, then htdig assumes that the document needs to be reindexed. Jim |