|
From: BOOTH, N. F. <Nic...@rb...> - 2003-11-16 22:15:22
|
I've noticed the following which may, or may not be "real" bugs when testing the new Beta version: 1] The htdig.conf file seems to be very, very sensitive to whitespace at the end of lines. In particular, with a multiline attribute as illustrated just below, if there is white space (tested with [tab]s) after the \ character, htdig _and_ htsearch will fail: server_aliases: www.cbfm.rbs.co.uk=www.cbfm.rbsgrp.net <http://www.cbfm.rbs.co.uk=www.cbfm.rbsgrp.net> \ www.cib.rbs.co.uk=www.cib.rbsgrp.net <http://www.cib.rbs.co.uk=www.cib.rbsgrp.net> 2] I can't seem to get any sensible changes to results with htsearch using url_seed_score url_seed_score: cbfm|fmintranet|cib. *500,+1000 \ manufacturing.|retail|technology.|wealthmanagement.|rbs.|group *.1, Even stupidly high factors don't seem to have an effect (like 100,000). (tried with and without commas and spaces separating values) 3] If there is _not_ a return after the last line in the config file then htsearch causes a cgi error. Results from apache eror log: Unknown char in line 224: #[Fri Nov 14 23:51:46 2003] [error] [client 147.114.74.200] malformed header from script. Bad header=syntax error: /var/www/cgi-bin/htsearch32 4] If you search for a phrase and it forms part of a longer string then the results are not highlighted in the extract displayed. This is most apparent when the second word is singular, but it finds a plural result. Search for "animal feedstuff" finds "animal feedstuff"s --- no highlight finds "animal feedstuff" --- highlight as expected Hope this makes sense! Lastly, are the cookies.txt mechanism and check_unique_md5 actually known to work? Running 3.2.0b5 on: Linux lon3561xus 2.4.9-31smp #1 SMP Tue Feb 26 06:55:00 EST 2002 i686 unknown It has happily indexed multi server intranet with about <50k pages, including parseing PDFs and Word docs - but, as ever, seems limited by my web server responses/network latentcy, so this took over 18 hours. I'm really very happy with what I've seen so far - especially the phrase search which is crucial for me to keep this product in place. Best regards Nicholas Booth Royal Bank of Scotland, Corporate Banking 280 Bishopsgate London *********************************************************************************** This e-mail is intended only for the addressee named above. As this e-mail may contain confidential or privileged information, if you are not the named addressee, you are not authorised to retain, read, copy or disseminate this message or any part of it. The Royal Bank of Scotland plc is registered in Scotland No 90312 Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB Regulated by the Financial Services Authority Visit our website at http://www.rbs.co.uk/CBFM/ *********************************************************************************** |