|
From: David A. <D.J...@so...> - 2003-10-08 14:10:22
|
Thank you, that output establishes that htdig is reading a .pdf file. The next question is: what is it doing with it? To answer that we need to see what you have in your configuration file. David Adams Corporate Information Services Information Systems Services University of Southampton ----- Original Message -----=20 From: "Natalya Kolesnikova" <Ja...@gm...> To: "Gilles Detillieux" <gr...@sc...> Cc: <htd...@li...> Sent: Wednesday, October 08, 2003 10:22 AM Subject: Re: [htdig] PDF-SEARCH > Thank you very much for your help! > I don't get error message, but I have never .pdf-Files in my search-List!!! > Hier is htdig -ivvv output when start_url is a single PDF file. > What is wrong??? > > natalya.kolesnikova@intranet:~> htdig -ivvv > > 1:1:http://intranet.panasonic.de/pel/ipr/training_course/IPR_books_JPO/= i > ntroduction_to_IPR.pdf > New server: intranet.panasonic.de, 80 > Retrieval command for http://intranet.panasonic.de/robots.txt: GET > /robots.txt H > TTP/1.0 > User-Agent: htdig/3.1.6 (kol...@pa...) > Host: intranet.panasonic.de > > Header line: HTTP/1.1 200 OK > Header line: Date: Wed, 08 Oct 2003 08:36:24 GMT > Header line: Server: Apache/1.3.27 (Linux/SuSE) PHP/4.3.1 > Header line: Last-Modified: Tue, 21 Aug 2001 22:00:00 GMT > Converted Tue, 21 Aug 2001 22:00:00 GMT to Tue, 21 Aug 2001 22:00:00 > Header line: ETag: "44005-e7-3b82d9e0" > Header line: Accept-Ranges: bytes > Header line: Content-Length: 231 > Header line: Connection: close > Header line: Content-Type: text/plain > Header line: > returnStatus =3D 0 > Read 231 from document > Read a total of 231 bytes > Parsing robots.txt file using myname =3D htdig > Robots.txt line: # exclude help system from robots > Robots.txt line: User-agent: * > Found 'user-agent' line: * > Robots.txt line: Disallow: /manual/ > Found 'disallow' line: /manual/ > Robots.txt line: Disallow: /doc/ > Found 'disallow' line: /doc/ > Robots.txt line: Disallow: /gif/ > Found 'disallow' line: /gif/ > Robots.txt line: # but allow htdig to index our doc-tree > Robots.txt line: User-agent: susedig > Found 'user-agent' line: susedig > Robots.txt line: Disallow: > Robots.txt line: # disallow stress test > Robots.txt line: user-agent: stress-agent > Found 'user-agent' line: stress-agent > Robots.txt line: Disallow: / > Pattern: /manual/|/doc/|/gif/ > pushed > pick: intranet.panasonic.de, # servers =3D > 1 > 0:0:0:http://intranet.panasonic.de/pel/ipr/training_course/IPR_books_JPO/= int rodu > ction_to_IPR.pdf: Retrieval command for > http://intranet.panasonic.de/pel/ipr/tra > ining_course/IPR_books_JPO/introduction_to_IPR.pdf: GET > /pel/ipr/training_course > /IPR_books_JPO/introduction_to_IPR.pdf HTTP/1.0 > User-Agent: htdig/3.1.6 (kol...@pa...) > Host: intranet.panasonic.de > > Header line: HTTP/1.1 200 OK > Header line: Date: Wed, 08 Oct 2003 08:36:24 GMT > Header line: Server: Apache/1.3.27 (Linux/SuSE) PHP/4.3.1 > Header line: Last-Modified: Fri, 29 Aug 2003 11:25:19 GMT > Converted Fri, 29 Aug 2003 11:25:19 GMT to Fri, 29 Aug 2003 11:25:19 > Header line: ETag: "314005-51e38-3f4f381f" > Header line: Accept-Ranges: bytes > Header line: Content-Length: 335416 > Header line: Connection: close > Header line: Content-Type: application/pdf > Header line: > returnStatus =3D 0 > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 7736 from document > Read a total of 335416 bytes > size =3D 335416 > pick: intranet.panasonic.de, # servers =3D 1 > natalya.kolesnikova@intranet:~> > > According to Natalya Kolesnikova: > > > may be I am stupid, but it doesn't work by me! Can somebody help me= ? I > > have > > > tried with acroread and with external parser xpdf, but it doesn't > > work!!!! > > > I need the Installation Guide!!! :))) > > > > See http://www.htdig.org/FAQ.html#q4.9 > > > > That is the installation guide for PDF indexing. If you've carefully read > > and implemented everything recommended there, and checked out FAQs 5.= 2 > > and 5.37 as David recommended (twice), then please provide more detai= ls, > > such as what error messages you get, or give us an excerpt of htdig -ivvv > > output when start_url is set to point to just one single PDF file. > > > > There are dozens of potential points of failure in this process, so simply > > saying "it doesn't work" gives us no information that can help pinpoi= nt > > which point of failure is the one that needs to be addressed. > > > > Also, make sure you have links in your HTML files to all PDF files yo= u > > want to index. (See http://www.htdig.org/FAQ.html#q5.25) > > > > --=20 > > Gilles R. Detillieux E-mail: <gr...@sc...> > > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.c= a/ > > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > ht://Dig general mailing list: <htd...@li...> > > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > > List information (subscribe/unsubscribe, etc.) > > https://lists.sourceforge.net/lists/listinfo/htdig-general > > > > > > --=20 > NEU F=DCR ALLE - GMX MediaCenter - f=FCr Fotos, Musik, Dateien... > Fotoalbum, File Sharing, MMS, Multimedia-Gru=DF, GMX FotoService > > Jetzt kostenlos anmelden unter http://www.gmx.net > > +++ GMX - die erste Adresse f=FCr Mail, Message, More! +++ > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > ht://Dig general mailing list: <htd...@li...> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > |