From: Chuck P. (C. Inc.) <chu...@co...> - 2005-02-17 17:45:31
|
>>>> I've been going nuts for two days and must be missing something >>>> simple or possibly misunderstanding the function of use_doc_date. >>>> In my htdig.conf file I have the following lines: >>>> # use meta date to determine a new page >>>> use_doc_date: true >>>> Within the head of each of the pages indexed I have something like >>>> this: >>>> <META NAME="Date" CONTENT="2005-02-16"> >>>> I've tried using META name dc.date, dc.date.created and >>>> dc.date.modified. I've also tried dates in the format YYYYMMDD, >>>> YYMMDD, YYYY-MM-DD HH:MM:SS, YYYY MM DD, etc.... >>> You are reindexing after each change aren't you? >> I'm reindexing after each change with a modified version of rundig >> that doesn't use the -i flag for htdig (I've also disabled the -a >> flag for the modified rundig). I've tried using the default rundig as >> well. No luck. > > You might want to try running with the -i (or manually removing the > database) in order to verify that everything is being retrieved and > indexed from scratch. > > Using the meta tag you list above, exactly as you have typed it (except > with a date that is not today), gives me the results I would expect > when I > index a page with use_doc_date enabled. > > What exactly are you expecting when you enable the use_doc_date > attribute > and in what way are the results differing from your expectations? > > Btw, if you run with a couple -v options tacked on, and use_doc_date is > working as expected, you should see some evidence in the debug output; > there should be a line that says something like 'time: 2000-01-02'. I tried manually deleting the database as well as running with -i. No change. I expected that enabling use_doc_date would make my modified rundig (no -i, no -a) only update the index for pages that have newer meta dates. This page for example has the following meta date tag: <META NAME="Date" CONTENT="2005-02-16"> but it always gets reindexed. 5:28:1:http://nbcmv4.console.net/release_detail.dev.nbc/ nbcuniversalcable-20050216000000-academy-awardnomin.html: (changed) ------------------------ size = 13255 Running rundig and grepping out the times matched gives me the following: homeplate local/htdig/bin $ sudo ./rundig.new -vvvv | grep time: time: 2005-02-16 time: 2005-02-16 time: 2005-02-16 time: 2005-02-16 time: 2005-02-16 time: 2005-02-16 I read in an archive that this output isn't confirmation that htdig is using the meta date, that if it fails and defaults to now it will do so silently. Thanks again, Chuck |