[htdig-dev] Patch to pdf2html.pl for better error checking

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

A client's Web site which is indexed with htdig was having problems
with some corrupt PDF files.  Unfortunately, the errors indicated that
the PDF files weren't indexing properly, but not which files were
causing the problem.  This patch to pdf2html.pl has it check the error
code of the PDF conversion programs, and report an error to stdout and
exit with a failure code if that program exits with a failure code:

    http://www.suspectclass.com/~sgifford/htdig/htdig-3.1.6-pdf2html-checkerrorcode.patch 

Adding to the confusion, xpdf's pdftotext doesn't exit with an error
code when it fails to parse a document.  This patch to xpdf fixes
that:

    http://www.suspectclass.com/~sgifford/htdig/xpdf-1.01-pdftotext-exitstatus.patch 

With these two patches, the stderr output of htdig includes the
temporary filename and a URL of the document whose conversion failed,
making tracking down problems much easier.

Let me know if you have any problems, questions, etc.

----ScottG.