We use a Toshiba e-Studio 281c to scan documents to a server share where they were OCR-processed by a Toshiba application called Re-Rite. IMHO this software is based on libraries by Nuance. I found a thread where my problem was discussed before (http://www.pdfsam.org/bbforum/viewtopic.php?f=2&t=714) referring to a software called Paper-Port which IIRC is also based on the Nuance libs. The recommended solution ("Use FoxitReader and reprint as PDF") is not satisfying for me for two reasons:
1. I can and do not want to deploy FoxitReader in our system only for that reason
2. Reprint as PDF (to get rid of the text-layer) produces *much* larger files (factor 10)
Furthermore i do *not* think that the explanation ("...image layer is damaged. Adobe Reader is overchallenged with this problem") does apply. The image-layer is *not* damaged as the possibility to reprint it with another software shows. Also Adobe Reader is able to display the OCRed pdfs correctly. Only after such pdfs are *merged* using pdfsam, the problem occurs. I think the problem is the way pdfsam deals with additional layers or at least an additional OCRed text layer.
Here i supply all files you need to reproduce the problem. Steps to reproduce:
1. merge the original OCRed pdf file "test_page.pdf" twice into a single pdf (see screenshot.png)
Log in to post a comment.