|
From: Gilles D. <gr...@sc...> - 2003-10-08 21:51:28
|
According to Gustave Stresen-Reuter: > If I'm not mistaken, since your start_url is a pdf document, it's the > only document that will get parsed and as far as I know, htdig is unable > to follow links in a pdf document. Htdig is only able to follow links in > html documents. Please correct me if I'm wrong on this last statement. > > You'll probably need to create some sort of index document that has > links to all the pdf files you want to index. > > On Wednesday, October 8, 2003, at 09:19 AM, Natalya Kolesnikova wrote: > > start_url: http://intranet.panasonic.de/pel/ipr/training_course/IPR_books_JPO/introduction_to_IPR.pdf Natalya set the start_url this way at my recommendation (see earlier postings in the thread) to rule out whether it's a problem with htdig being able to actually index PDF files given the URLs, as opposed to a problem with finding the URLs to the PDFs. Her test showed that it failed with a single PDF file, which suggests a problem either with that PDF file or with the setup of the external parser. That's the next stage of testing to tackle. Once her configuration is working reliably for a single PDF, given the URL, she'll be in a better position to try and see if it's also having problems finding the URLs from links in other documents. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |