#3 form recognition

open
nobody
None
5
2006-09-07
2006-09-07
Anonymous
No

Tesseract does a good job of reading plain text. Is
there anyway that I could ocr forms? (ie. hospital
forms, job applications,etc.)

Discussion

  • Ray Smith
    Ray Smith
    2006-09-07

    Logged In: YES
    user_id=1515161

    The requested feature is unlikely to be supported any time soon,
    so please do not hold your breath!

     
  • Logged In: YES
    user_id=37894

    Well, if you're not lazy, you could use netpbm chain to mask
    off parts from the original image into tess. That requires a
    bit of coordination on YOUR part - tess doesn't care at all.

    I use this in a fax ocr application where I need to snip off
    the sending-fax-machine header to determine if the actual
    fax body is right-side-up or up-side-down! Just g3topbm
    blah.g3 | pamcut -left x -top y -width xx -heigh yy |
    pnmtotiff > blah.tif

    Easy as pie, very stable, and very fast.

    Why reinvent the wheel *in* tess when this works just as
    well? I am assuming that you're scanning many forms which
    similar layout.

    References:
    http://netpbm.sourceforge.net/doc/g3topbm.html
    http://netpbm.sourceforge.net/doc/pamcut.html
    http://netpbm.sourceforge.net/doc/pnmtotiff.html