Thread: [htdig-dev] Performance issue with exclude_urls

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Greetings htdig folks,

 In my search.conf my exculde_urls is set to:

 exclude_urls: `/usr/local/htdig/common/excludeURL/exclude_urls`

 This works as expected, and all URLs that are in that file are excluded
from being indexed. 

 However, I've noticed that adding URLs *seriously* degrades digging
performance. To a point that with 30 or so patterns, I got 8k pages in
over 8 hours, and without them, I could do 15k pages in an hour.

 With such a drastic difference, I'm assuming that there's a bug
somewhere. I'll try to go digging through the code to find it, but I
imagine that someone on this list will have better luck than me. :-)

Cheers,

Chris

-- 
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Communications Group 
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017