From: Budd, S. <s....@im...> - 2003-01-24 17:50:01
|
using htdig 3.2.0b4-20021110 on Solaris 8 Dig with the following bits of config file: ............................ minimum_word_length: 2 external_protocols: https /home/ppp/htdig-test-3.2.0b4.1110.play/bin/handler.pl allow_in_form: search_algorithm allow_numbers: true database_dir: /3.2.0b4/1110/helpdesk conf: /home/ppp//htdig-test-3.2.0b4.1110/special-runs/helpdesk-dig max_hop_count: 14 check_unique_md5: true external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl \ application/msword->/text/html /usr/local/bin/doc2html.pl # the following configuration variable will prevent any unaccessed url's from being deleted. #remove_bad_urls: false remove_bad_urls: true .................................. The md5 sums appear in the vvv listing ok. The results of a search may have two , sometime three, entries with identical url and extract. Can you indicate what is wrong? |