|
From: Willy C. <Wil...@uk...> - 2002-04-15 09:25:53
|
OK, I see what you mean. Thanks for the info!
If I create $(common_dir}/index.txt to be a regular text file with a valid
starting URLS (presumably the URLs are separated by commas or
semi-colons?), then the htdig spider should crawl through these and build a
database based on this. Right? I'd rather do that than mess around with
with Apache's config at the moment with regards to directives ;-)
This leads me to another question, what output would I get if htdig
works? For example, I do a rundig -vvv and the system seems to hang for a
few minutes until I kill it (ctrl-C) like so:-
=======================================================
hostname# rundig -vvv
1:1:http://hostname/index.txt
New server: hostname, 80
Retrieval command for http://hostname/robots.txt: GET /robots.txt HTTP/1.0
User-Agent: htdig/3.1.6 (unc...@ht...)
Host: hostname
Header line: HTTP/1.1 404 Not found
Header line: Server: Netscape-Enterprise/3.0
Header line: Date: Sun, 14 Apr 2002 09:48:28 GMT
Header line: Content-type: text/html
Header line: Content-length: 207
Header line: Connection: close
Header line:
returnStatus = 1
pushed
pick: hostname, # servers = 1
0:0:0:http://hostname/index.txt: Retrieval command for
http://hostname/index.txt: GET /index.txt HTTP/1.0
User-Agent: htdig/3.1.6 (unc...@ht...)
Host: hostname
Header line: HTTP/1.1 200 OK
Header line: Server: Netscape-Enterprise/3.0
Header line: Date: Sun, 14 Apr 2002 09:48:28 GMT
Header line: Content-type: text/plain
Header line: Last-modified: Sun, 14 Apr 2002 09:48:26 GMT
Converted Sun, 14 Apr 2002 09:48:26 GMT to Sun, 14 Apr 2002 09:48:26
Header line: Content-length: 977
Header line: Accept-ranges: bytes
Header line: Connection: close
Header line:
returnStatus = 0
Read 977 from document
Read a total of 977 bytes
size = 977
pick: hostname, # servers = 1
htmerge: Sorting...
htmerge: Merging...
htmerge: 100:ukaboutusabo
0/http://hostname/index.txt
^C
hostname #
=======================================================
At 14:00 12/04/02 -0600, Jim Cole wrote:
>Willy Calderon's bits of Fri, 12 Apr 2002 translated to:
>
> >I've got a few output lines from doing a rundig in which I'm being asked
> >what to index.
> >
> >=========================
> >host# rundig -vvv
> > 1:1:
> >New server: , 0
> >Unknown host: 0/robots.txt
> > pushed
> >pick: , # servers = 1
> >htmerge: Unable to open word list file '/opt/www/htdig/db/db.allwords.text'.
> > Did you index anything?
> > Check your config file and try running htdig again.
> >==========================
> >
> >At the moment my htdig.conf file looks something like this
>...
> >start_url: ${common_dir}/index.html
>
>What does the index.html file in ${common_dir} look like? It
>shouldn't be HTML. It should just be a regular text file that
>lists all of the starting URL's for your indexing run. If on
>the other hand you actually want to start with a single HTML
>file and dig from there, then specify a valid URL in start_url.
>For example
>
>start_url: http://www.somedomain.com/index.html
>
>
>Jim
**************************************************************************
DISCLAIMER
The contents of this e-mail are not necessarily the policy or
opinion or representative of any policy or opinion of the Authority
or any person employed by it. This transmission is intended only
for the named recipient(s) and is confidential in nature. If received
in error, please return it to the sender and destroy any copies
immediately.
**************************************************************************
|