From: Harrell, Roger <rjharrel@be...> - 2004-02-23 20:41:20
> It is my understanding that the exclude_urls: var excludes the urls from
> dig. So I have a conf file that has:
> exclude_urls: /cgi-bin/ .cgi /home/ /artists/ /products/ /bible/
> /music/MusicNote /members/
> But when I run htdig -v the dig is clearly hitting /members/xxxx
> Unfortunately this gets it caught in an infinite loop because one of the
> sections sets a new php session ID for each request so as far as htdig is
> concerned it's a new page. So it just bounces back and forth between two
> pages. What am I missing about the exclude_urls config?
>Does it appear as two lines in your htdig.conf, or was it simply
>folded by your mail program? See http://www.htdig.org/FAQ.html#q5.31
>for this and other possible causes. Does an htdig -vvv give any
>indication of which exclude_urls patterns, if any, are being excluded?
>It might also help to know what version of htdig you're running, if
>indeed this is a bug. See http://www.htdig.org/FAQ.html#q5.33
>I vaguely recall problems with this attribute in older 3.2 betas,
>or perhaps it was in development code.
Folded by mail program. I found the problem:
"Another, more subtle latent effect occurs with releases 3.1.6 and 3.2
betas: when you interrupt htdig (i.e. with Control-C or a kill command), it
stores the list of currently queued URLs in db.log, in your database
directory, so that the next time you invoke htdig it can resume the
interrupted dig. A side-effect of this file is that if you change some
attributes like limit_urls_to or exclude_urls before restarting, the URLs in
the file are still taken as-is, having been checked against the old settings
of limit_urls_to or exclude_urls before being queued. This might explain one
reason htdig seems to ignore your new settings of these. "
I was interupting with control-C during testing. Deleted the files and
restarted. Seems to be going well now. I'll let you know. FYI running 3.1.6
on your recommendation.