From: David A. <D.J...@so...> - 2004-03-25 11:17:51
|
Gilles, I can confirm that I am using ht://Dig version 3.1.6. My htdig/Retriever.cc is: // $Id: Retriever.cc,v 1.36.2.28 2002/01/25 04:44:33 ghutchis Exp $ It has been patched by a patch written by yourself which reads: "This patch fixes a problem introduced in 3.1.6's handling of use_doc_date, which wasn't in the 3.1.5 patches for this feature. The new date parsing code in 3.1.6 didn't allow a '-' character after the year in the content attribute of meta date tags, but only allowed white space, which is obviously not in accordance with the ISO 8601 date format standard." which does not sound relevant. I do have .jpg in the file listing bad_extensions. The -v output ONLY lists pages like http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg and no other pages containing .jpg. I do not have a valid_extensions: statement. David Adams Corporate Information Services Information Systems Services University of Southampton ----- Original Message ----- From: "Gilles Detillieux" <gr...@sc...> To: "David Adams" <D.J...@so...> Cc: <htd...@li...>; "Toby Thain" <to...@te...> Sent: Wednesday, March 24, 2004 9:42 PM Subject: Re: [htdig] query parameters should be ignored by extension filter? > According to David Adams: > > I am also using ht://Dig version 3.1.6 and for me it IS indexing URLs like > > > > http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg > > > > even though I have .jpg in my bad_extensions: list. > > Actually, I find this surprising. Upon looking at the code that handles > bad_extensions, in both 3.1.6 and 3.2.0b5, it seems to me that there is > indeed a bug in the way htdig locates filename extensions in URLs, as > Toby described. Can you confirm that you're running vanilla 3.1.6 with > no patches to htdig/Retriever.cc which might correct this bug? > > The fix to the code should be pretty simple, but I haven't had the time > to sit down and stare at it long enough to get the fix coded yet. I'll > try to get around to it by Friday, so it'll be in the next development > snapshot for the 3.2 betas, and posted to the list. > > > ----- Original Message ----- > > From: "Toby Thain" <to...@te...> > ... > > > I noticed today that htdig is not indexing URLs like: > > > > > > /foo/page.php3?f=bar.jpg > > > > > > because it notices the URL ends with ".jpg". I am surprised that it's > > > not smart enough to realise that the fetched object is actually a > > > ".php3", and I definitely want that URL followed. > > > > > > Is this fixed in a recent version (I am using ht://Dig 3.1.6)? Or is > > > there a simple configuration fix? > > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > |