pdfsandwich

3.8 Stars (4)
45 Downloads (This Week)
Last Update:
Download pdfsandwich-0.1.3.tar.bz2
Browse All Files
BSD Linux

Description

pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images.

pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text.

Essentially, pdfsandwich is a wrapper script which calls the following binaries: convert, unpaper, tesseract, gs, and hocr2pdf (if tesseract < 3.03). It is known to run on Unix systems and has been tested on Linux and MacOS X. It supports parallel processing on multiprocessor systems.

In contrast to most competing sandwich programs, it performs preprocessing of the scanned images, such as de-skewing or removal of dark edges etc.

For further information please read the manual: http://www.tobias-elze.de/pdfsandwich/index.html

pdfsandwich Web Site

Update Notifications





User Ratings

★★★★★
★★★★
★★★
★★
2
1
0
0
1
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 5 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
Write a Review

User Reviews

  • tfileme
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    Thanks for updates ;)

    Posted 05/14/2013
  • blakewells
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    great program pdfsandwich, thanks.

    Posted 09/22/2012
Read more reviews

Additional Project Details

Languages

English

Intended Audience

End Users/Desktop

User Interface

Command-line

Programming Language

OCaml (Objective Caml)

Registered

2012-05-13
Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.