From: Christian F. <fre...@en...> - 2002-04-04 16:52:57
|
Well, I tried this to no avail. I still receive no errors, but do see: Deleted, no excerpt: for every PDF file. All my Word docs are parsed fine using doc2html. Yes this is version 3.1.6. Any other ideas? This is driving me nuts and many documents are PDF format so I have to have them parsed. Chris -----Original Message----- From: htd...@li... [mailto:htd...@li...]On Behalf Of Rzepa, Henry Sent: Thursday, April 04, 2002 12:12 AM To: htd...@li... Subject: Re: [htdig] PDF problems >I have switched to using the conv_doc.pl to parse my pdf files, I have ran >this and the pdftotest to make certain the output was text and everything >ran correctly. It all works perfectly, but when running htdig I see: >Deleted, no excerpt: 7/http:// >for all the PDF files. WHY? I need to have the PDF documents parsed, but I >get correct data when running conv_doc.pl, but nothing with htdig. > I presume we are talking version 3.1.6 here? I had a lot of difficulties with this version in running external parsers, with the same sort of syndrome, ie excerpt deleted. I disabled the acroread invocation (which had worked, as above, when invoked manually to test) and moved directly to pdf2html.pl as below. Curiously, we have only been able to get external parsers to work if they are invoked from a script, as below. Our attempts to run executables directly (as in the disabled Acroread example below) all result in the above syndrome. so we now call the executables from a small script which calls them with four arguments. I might mention that we did not get this problem with v 3.1.4, and currently remain baffled as to the difference. The below is from our conf file #pdf_parser: /usr/adobe/Acrobat4.0/bin/acroread -toPostScript external_parsers: application/pdf->text/html /var/www/htdig/scripts/doc2html/pdf2html.pl -- Henry Rzepa. +44 (0870) 132 3747 (eFax) +44 0778 6268 220 (Mobile) http://www.ch.ic.ac.uk/rzepa/ Dept. Chemistry, Imperial College, London, SW7 2AY, UK. _______________________________________________ htdig-general mailing list <htd...@li...> To unsubscribe, send a message to <htd...@li...> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html |