From: Lachlan A. <lh...@us...> - 2003-12-21 11:57:49
|
Greetings all, I've applied this patch to 3.2.0 CVS. Since encoding will usually be=20 used fairly consistently, I think it is OK just to compare encoded=20 strings (since the alternative of decoding every matching URL is less=20 efficient). What do others think? Lachlan On Tue, 18 Nov 2003 01:34, Jean-Sebastien Morisset wrote: > I found a bug in v3.1.6, and probably in all future versions too. > Here it is: > > If you enter a "restrict" value in the URL for htsearch (not in the > config file), it will be compared UNENCODED to the ENCODED URLs in > htdig's database. > > For example, the following query: > > http://www.mvpix.com/cgi-bin/perl/search?words=3D%2A&restrict=3D/photos >/021/Netherland%20Antilles/Bonaire/Places/Urban/&method=3Dand&sort=3Ddat >e&format=3Dshort > > Will never match: > > http://www.mvpix.com/photos/021/Netherland%20Antilles/Bonaire/Place >s/Urban/Industry/20030511-062204.jpg.html > > I've fixed htsearch temporarily with the following code, but some > thought probably should be given on how to address this. I suspect > the solution is to compare both strings in their unencoded form. > > My snippet: > > root@dent:/mnt/lan/src/htdig-3.1.6$ diff htsearch/htsearch.cc-orig > htsearch/htsearch.cc 23a24 > > > #include "URL.h" > > 169,170c170,174 > < if (input.exists("restrict")) > < config.Add("restrict", input["restrict"]); > --- > > > if (input.exists("restrict")) { > >=09String restrict_url =3D input["restrict"]; > >=09encodeURL(restrict_url, "-_./"); > >=09config.Add("restrict", restrict_url); > > } > > root@dent:/mnt/lan/src/htdig-3.1.6$ > > Another side-effect of using 'config.Add("restrict", > input["restrict"]);' un-encoded is that any spaces will be treated > as ORs later on by this line 'urllist.Create(config["restrict"], "| > \t\r\n\001");'. > > BTW, this same bug affects the "exclude" value too. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |