From: Gilles D. <gr...@sc...> - 2002-06-04 20:56:54
|
According to ch...@sp...: > Hello Gilles: > > Thanks for pointing out my faulty external parser configuration yesterday.... > I'm successfully indexing over 350 pdf files, however we have about a > dozen where we get the 'Error (0): PDF file is damaged..." error. > Each of these files display correctly in Acrobat on PC's and Mac's, > perhaps coincidentally these errors occur on PDF files created on a > Macintosh. The error occurs running pdf2html.pl directly or via HTDIG > with a large max_doc_size. I've read through the archives and don't > think it's the max_docsize. Any suggestions? If you get the error while running pdf2html.pl directly, then it's not a problem with max_doc_size, because pdf2html.pl doesn't use that (or any) config attribute. The definitive test would be to run pdftotext and/or xpdf directly on one of the PDF files that's giving you problems. Likely that would give you the same error. I'd recommend you first make sure you're running a recent version of xpdf on your system. Current version is 1.01. If you're running an older version, it's worth trying a more recent one to see if it can handle PDFs that older versions couldn't. If the latest version still has problems with some PDFs that you know are correct and readable in Acrobat, then you may want to contact xpdf's author about this. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |