pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images.

pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text.

Essentially, pdfsandwich is a wrapper script which calls the following binaries: convert, unpaper, tesseract, gs, and hocr2pdf (if tesseract < 3.03). It is known to run on Unix systems and has been tested on Linux and MacOS X. It supports parallel processing on multiprocessor systems.

In contrast to most competing sandwich programs, it performs preprocessing of the scanned images, such as de-skewing or removal of dark edges etc.

For further information please read the manual: http://www.tobias-elze.de/pdfsandwich/index.html

Project Activity

See All Activity >



Follow pdfsandwich

pdfsandwich Web Site

Other Useful Business Software

ManageEngine now lets you patch computers from cloud ManageEngine now lets you patch computers from cloud Icon
ManageEngine now lets you patch computers from cloud Icon

ManageEngine Patch Manager Plus offers patching capabilities on cloud - Patch all OS updates and 550 third party apps on-demand and stay secure!

Last year saw the hackers unleash their expertise in skills for a ransom - targeting those unpatched computers in data-critical institutions. Only if the computers were patched on time, billions of dollars could have been saved. ManageEngine’s Patch Manager Plus comes to your rescue, offering you an automated patching solution, now on cloud, for latest hotfixes of Windows updates and a wide range of third-party applications (over 550 of them).
Automate patching with Patch Manager Plus, sit back & relax!
Patch on cloud now

Rate This Project

Login To Rate This Project

User Ratings

ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

User Reviews

  • I have been looking for a long time for this exact utility. I very often have a need to convert pdf files to a searchable format. Thank you for putting this together.

  • Pdfsandwich does exactely what I always was missing in Tesseract. Great lilttle piece of software with many good ideas!

  • I have been looking for something like this

  • Excellent tool. It did exactly what I wanted - performing OCR on a PDF that I had scanned and creating a new PDF with the original image but also text that could be searched and cut/pasted. It required absolutely no effort to configure or operate.

    1 user found this review helpful.
  • *Just* what I was looking for! Thanks!

Read more reviews >

Additional Project Details



Intended Audience

End Users/Desktop

User Interface


Programming Language

OCaml (Objective Caml)