From: Uta B. <ub...@ci...> - 2001-10-29 13:38:41
|
Hi Jamie, thanks a lot for your help. I solved my problem with = 'url_rewrite_rules' and I'am very happy :-). I could also remove the value of the parameter with a regular = expression: url_rewrite_rules: (.*)&_last=3D[0-9]*&(.*) \\1&\\2 -- Uta Becht -----Urspr=FCngliche Nachricht----- Von: Jamie Anstice [mailto:jam...@sl...] Gesendet: Donnerstag, 25. Oktober 2001 23:48 An: htd...@li... Betreff: Re: [htdig-dev] Problemes with bad query-string Won't this just reject the whole URL? If I understand the problem, Uta = wants=20 to throw out the session parameter but leave the rest intact, so the = page=20 if fetched once only. I've come across this issue before, and I've got a patch = for=20 3.2.x (which I ported from our 3.1.6 version) but I think you could also use=20 url_rewrite_rules too. My patch is somewhat more specific than url_rewrite_rules, in that it = just=20 removes unwanted parameters + value from the URL. I'll post it along with my=20 patch for=20 ignoring the alt text from images (which is driven from a config = option)=20 in a few days - I'm currently experimenting with tweaking the scoring to make an = alternate=20 'or' behaviour which scores up results which contains more than one search=20 term. Short explanation by way of example: say I'm indexing a university=20 website. The=20 chemistry department has a whole bunch of pages with the word chemistry = all through them, and one page telling students where to buy replacement = lab=20 glassware. An 'and' search for 'chemistry glassware' finds this page and nothing=20 else, an 'and' search for 'chemistry glassware sales' finds nothing. An 'or' search = for=20 'chemistry glassware sales' is swamped by the occurance of 'chemistry' and=20 'glassware' is lost in the noise. What I'm doing is factoring in the number of = distinct=20 words from the search phrase found in the result to bump up score for pages with = more=20 than=20 one search term. Initial results look quite promising, but will need a = bit of tuning to=20 improve search speed. Jamie Anstice Search Engineer S.L.I. Systems jam...@sl... ph: 64 961 3262 mobile: 64 21 264 9347 Geoff Hutchison <ghu...@ws...> Sent by: htd...@li... 26/10/01 01:49 =20 To: "Uta Becht" <uta...@we...> cc: "htdig" <htd...@li...> Subject: Re: [htdig-dev] Problemes with bad query-string At 11:12 AM +0200 10/25/01, Uta Becht wrote: >Can someone give me an idea at which position of htdig I should=20 >eleminate this bad query_parameter ?? Why not use bad_querystr: <http://www.htdig.org/attrs.html#bad_querystr> --=20 -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ _______________________________________________ htdig-dev mailing list htd...@li... https://lists.sourceforge.net/lists/listinfo/htdig-dev _______________________________________________ htdig-dev mailing list htd...@li... https://lists.sourceforge.net/lists/listinfo/htdig-dev |