Re: [htdig] PDF-SEARCH

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

According to Gustave Stresen-Reuter:
> If I'm not mistaken, since your start_url is a pdf document, it's the
> only document that will get parsed and as far as I know, htdig is unable
> to follow links in a pdf document. Htdig is only able to follow links in
> html documents. Please correct me if I'm wrong on this last statement.
> 
> You'll probably need to create some sort of index document that has
> links to all the pdf files you want to index.
> 
> On Wednesday, October 8, 2003, at 09:19  AM, Natalya Kolesnikova wrote:
> 
> start_url:		http://intranet.panasonic.de/pel/ipr/training_course/IPR_books_JPO/introduction_to_IPR.pdf

Natalya set the start_url this way at my recommendation (see earlier
postings in the thread) to rule out whether it's a problem with htdig
being able to actually index PDF files given the URLs, as opposed to
a problem with finding the URLs to the PDFs.  Her test showed that it
failed with a single PDF file, which suggests a problem either with
that PDF file or with the setup of the external parser.  That's the next
stage of testing to tackle.

Once her configuration is working reliably for a single PDF, given the
URL, she'll be in a better position to try and see if it's also having
problems finding the URLs from links in other documents.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)