|
From: Gilles D. <gr...@sc...> - 2002-04-12 20:27:23
|
According to Jim Cole:
> Willy Calderon's bits of Fri, 12 Apr 2002 translated to:
...
> >At the moment my htdig.conf file looks something like this
> ...
> >start_url: ${common_dir}/index.html
>
> What does the index.html file in ${common_dir} look like? It
> shouldn't be HTML. It should just be a regular text file that
> lists all of the starting URL's for your indexing run.
That's not quite right. If you want to feed a list of URLs from a
plain text file into htdig's start_url attribute, you have to do it
something like:
start_url: `${common_dir}/urllist.txt`
Of course, you can put any file pathname within the left quotes, but the
left quote marks are necessary for feeding a file into an attribute like
this. (It can be used for any attribute, by the way.)
See http://www.htdig.org/cf_variables.html
and http://www.htdig.org/FAQ.html#q5.25
> If on
> the other hand you actually want to start with a single HTML
> file and dig from there, then specify a valid URL in start_url.
> For example
>
> start_url: http://www.somedomain.com/index.html
Yes, that's the key point here. Regardless of how you set start_url, or
how many entries you put in start_url, each entry must be a valid URL which
specifies the protocol and server. In 3.1.x, the protocol must be "http:".
--
Gilles R. Detillieux E-mail: <gr...@sc...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
|