The Google Search Appliance uses pdftohtml version
0.33a. 0.33a is unable to read some OCR'ed files
and therefore the appliance does not index them
(since they are blank). We have approximately
10,000 files that we want to run thru the 0.33a.
Those that are blank will be re-scanned with a
Do you know of a way to use your software to
process multiple files? Additionally, how can you
tell if there are blank HTML files, other than
opening and viewing each converted file?
Log in to post a comment.