We use a Toshiba e-Studio 281c to scan documents to a server share where they were OCR-processed by a Toshiba application called Re-Rite. IMHO this software is based on libraries by Nuance. I found a thread where my problem was discussed before (http://www.pdfsam.org/bbforum/viewtopic.php?f=2&t=714) referring to a software called Paper-Port which IIRC is also based on the Nuance libs. The recommended solution ("Use FoxitReader and reprint as PDF") is not satisfying for me for two reasons:
1. I can and do not want to deploy FoxitReader in our system only for that reason
2. Reprint as PDF (to get rid of the text-layer) produces *much* larger files (factor 10)
Furthermore i do *not* think that the explanation ("...image layer is damaged. Adobe Reader is overchallenged with this problem") does apply. The image-layer is *not* damaged as the possibility to reprint it with another software shows. Also Adobe Reader is able to display the OCRed pdfs correctly. Only after such pdfs are *merged* using pdfsam, the problem occurs. I think the problem is the way pdfsam deals with additional layers or at least an additional OCRed text layer.
Here i supply all files you need to reproduce the problem. Steps to reproduce:
1. merge the original OCRed pdf file "test_page.pdf" twice into a single pdf (see screenshot.png)
the original OCRed pdf created by ReRite
the settings used for merging
the merged result pdfsam creates
the error message Adobe Reader displays after opening the result
the log messages pdfsam created during processing
2. open the merged result ("result.pdf") using Adobe Reader (i used v10.1.1)
3. adobe reader displays the error message in "error.png"