From: Tod T. <tt...@ch...> - 2002-09-10 15:59:55
|
Geoff Hutchison wrote: > > On Mon, 9 Sep 2002, Tod Thomas wrote: > > > I have a document from last year that is getting indexed with the > > current date as the modified date. I've checked the file myself and its > > a year old and hasn't been touched since then. > > Right, but are you sure the server is actually sending a date in the > Last-Modified: header? If the file is sent as dynamic content, > e.g. .shtml, .cgi, .php, .asp, .jsp, etc. then the server will not send a > date by default and the only logical thing for htdig to do is pick the > current time. Ok, thanks. I wasn't sure about the Last-modified part of the header but also wanted to know htDig's behavior in its absence. > > I used htdig -t to get an ASCII dump to look at the modification date > > and it looks like this - m:1031587378. Could somebody help me out > > with the format of this date? > > This is the number of seconds since 1970-01-01, a commonly used UNIX date > format. (If you have GNU date, you can get this with date +"%s" from a > command-line.) > > Since I got 1031609537 just now, this was about 6 hours ago, give or take. Duh, I new that. I don't have GNU date but perl -e 'print localtime(1031587378) . "\n";' worked just fine. > > I imagine htdig uses a number of different dates in priority order - > > maybe a META date followed by an internally stored binary date, or > > If you have a META date in the document, you should take a look at the > use_doc_date attribute in 3.1.6: > <http://www.htdig.org/attrs.html#use_doc_date> I will do that, thanks. > But remember, if it's using the current date on an old file, that's > because the server is not sending a correct 'Last-Modified: header. You can > see the headers returned from a server if you run "htdig -vvv" (which also > outputs a variety of additional debugging information as the indexing > proceeds). > Did that (-vvv) and thats the case with about 87% of our content on this particular site. I did a Usenet search and apparently a missing 'Last-Modified' header is a common complaint among Netscape Enterprise shops :( Thanks again - Tod. |