From: Gilles D. <gr...@sc...> - 2002-01-03 18:36:34
|
According to Geoff Hutchison: > On Sun, 23 Dec 2001, David Melton wrote: > > Is there any way to change the way that ht://Dig determines the date > > for a file that it's searching? In the case of the list archive, it > > would be simple to write an external script to get the date from the > > message's X-Date: header. I would expect that this might be widely > > useful. In the case of my other application, I could also write a > > simple program to extract a date from the html file. > > Sure. This has come up several times. There's even an HTML META tag to > handle dates. So if you can get a <META name="date" ...> tag into the > documents, then you'll be fine. > ftp://ftp.ccsf.org/htdig-patches/3.1.5/SortMetaDate.0 > > Offhand, I don't know if the use_doc_date attribute has been added to the > 3.1.6 snapshots, but if not, I'll make sure it's in there when I get back > from vacation. Yes, it's in 3.1.6. Also, 3.1.6 adds support for Dublin Core date fields as well, i.e. name="dc.date" and a few others, not just name="date". If you can somehow get the site you're indexing to put out these meta tags from the X-Date field, that should do the trick for you. > > My other case could be more of a problem, since the historical files > > contain data going back to 1757, which is a long time before 1970... > > I don't know what standards are available for this. I know some systems > have time_t as a signed variable type, so it can count before Jan 1, 1970, > but it's not cross-platform. (Similarly not all UNIX-like platforms have > switched to 64-bit times and older platforms, will of course hit the 2038 > barrier.) Yeah, to handle dates going back that far, you'd need a system that supports 64-bit signed time_t fields, as well as an strftime() function that interprets negative time_t values as pre-1970 dates. You'd also likely need to make a few tweaks to parsedcdate() in 3.1.6's Retriever.cc so it allows years before 1900, and so it does the 64-bit arithmetic correctly. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |