Menu

#20 Batch process and Viewing results

open
nobody
None
5
2004-12-07
2004-12-07
Anonymous
No

The Google Search Appliance uses pdftohtml version
0.33a. 0.33a is unable to read some OCR'ed files
and therefore the appliance does not index them
(since they are blank). We have approximately
10,000 files that we want to run thru the 0.33a.
Those that are blank will be re-scanned with a
different software.

Do you know of a way to use your software to
process multiple files? Additionally, how can you
tell if there are blank HTML files, other than
opening and viewing each converted file?

Thank you,
wongn@metro.net

Discussion


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.