Help save net neutrality! Learn more.
Close

#8 stdin/stdout/stderr support

open
nobody
None
5
2006-12-11
2006-12-11
No

Ideally, tesseract should be able to accept a TIFF file piped in on stdin, write best-guess output to stdout and errors to stderr. Additional output (.map files, .raw files, etc) should be easily disabled, or sent to stderr in a format permitting easy line-by-line parsing to determine what a given line of output is (ie. prefix with 'E:' for errors, 'M:' for map output lines, etc).

This would allow single-pass pipeline processing, like so:

### example taken from an imaginary HylaFAX faxrcvd
# script. Use GraphicsMagick to make sure we're monchrome
# and in tiff format -- we support receiving color and
# PDF-format faxes -- then pipe through tesseract; our
# stdout is already going into the email being sent to
# the intended recipient of this fax.
echo "--$MIMEBOUNDARY"
echo "Content-Type: text/plain; name=\"${FILENAME}.txt\""
echo "Content-Description: Fax document text"
echo ""
gm convert "${INFILE}" -monochrome -format tiff - \ | tesseract

I looked briefly at doing this myself, but the mechanism for adding configuration hooks is a bit less than fully documented.

Discussion


Log in to post a comment.