From: Harrell, Roger <rjharrel@be...> - 2004-02-23 20:41:20
> It is my understanding that the exclude_urls: var excludes the urls from
> dig. So I have a conf file that has:
> exclude_urls: /cgi-bin/ .cgi /home/ /artists/ /products/ /bible/
> /music/MusicNote /members/
> But when I run htdig -v the dig is clearly hitting /members/xxxx
> Unfortunately this gets it caught in an infinite loop because one of the
> sections sets a new php session ID for each request so as far as htdig is
> concerned it's a new page. So it just bounces back and forth between two
> pages. What am I missing about the exclude_urls config?
>Does it appear as two lines in your htdig.conf, or was it simply
>folded by your mail program? See http://www.htdig.org/FAQ.html#q5.31
>for this and other possible causes. Does an htdig -vvv give any
>indication of which exclude_urls patterns, if any, are being excluded?
>It might also help to know what version of htdig you're running, if
>indeed this is a bug. See http://www.htdig.org/FAQ.html#q5.33
>I vaguely recall problems with this attribute in older 3.2 betas,
>or perhaps it was in development code.
Folded by mail program. I found the problem:
"Another, more subtle latent effect occurs with releases 3.1.6 and 3.2
betas: when you interrupt htdig (i.e. with Control-C or a kill command), it
stores the list of currently queued URLs in db.log, in your database
directory, so that the next time you invoke htdig it can resume the
interrupted dig. A side-effect of this file is that if you change some
attributes like limit_urls_to or exclude_urls before restarting, the URLs in
the file are still taken as-is, having been checked against the old settings
of limit_urls_to or exclude_urls before being queued. This might explain one
reason htdig seems to ignore your new settings of these. "
I was interupting with control-C during testing. Deleted the files and
restarted. Seems to be going well now. I'll let you know. FYI running 3.1.6
on your recommendation.
Get latest updates about Open Source Projects, Conferences and News.