pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images.

pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text.

Essentially, pdfsandwich is a wrapper script which calls the following binaries: convert, unpaper, tesseract, gs, and hocr2pdf (if tesseract < 3.03). It is known to run on Unix systems and has been tested on Linux and MacOS X. It supports parallel processing on multiprocessor systems.

In contrast to most competing sandwich programs, it performs preprocessing of the scanned images, such as de-skewing or removal of dark edges etc.

For further information please read the manual: http://www.tobias-elze.de/pdfsandwich/index.html

Project Activity

See All Activity >

License

GNU General Public License version 2.0 (GPLv2)

Follow pdfsandwich

pdfsandwich Web Site

You Might Also Like
An All-in-One EMR Exclusively for Therapy and Rehab. Icon
An All-in-One EMR Exclusively for Therapy and Rehab.

Electronic Medical Records Software

Managing your therapy and rehab practice is a time-consuming process. You spend hours on paperwork, billing, scheduling, and more. Raintree’s Therapy & Rehab EHR is here to help you manage your practice more efficiently. With our all-in-one solution, you’ll get the tools you need to streamline your therapy and rehab practice, improve patient care, and get back to doing what you love.
Rate This Project
Login To Rate This Project

User Ratings

★★★★★
★★★★
★★★
★★
6
1
0
0
1
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

User Reviews

  • No works with russian pdf, despite -l rus parameter.
Read more reviews >

Additional Project Details

Operating Systems

Linux, BSD

Languages

English

Intended Audience

End Users/Desktop

User Interface

Command-line

Programming Language

OCaml (Objective Caml)

Related Categories

OCaml (Objective Caml) Business Software, OCaml (Objective Caml) Command Line Tools, OCaml (Objective Caml) OCR Software

Registered

2012-05-13