We were getting a lot of errors from PDF Indexing.
There are two issues:
1. Many of out PDF files are not parsable by the PDFBox
library for various reasons, causing exception to be
thrown.
2. The file handle is not closed correctly in cases
where PDFBox fails to parse (because thrown exception
prevents file being closed).
Patch is designed to address both issues by:
1. less severe action when item not parsable - log
details but return empty content for indexer.
2. parser sorrounded by try / finally to ensure file
handle always closed.
pdf indexer patch
Logged In: YES
user_id=1271522
By the way, we have also upgraded PDFBox to version
PDFBox-0.7.2-log4j
..which seems to be working OK.