Tesseract OCR / Feature Requests / #5 Modularity

#5 Modularity

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2006-09-11

Created: 2006-09-11

Creator: Anonymous

Private: No

Can we separate out the various steps like baselining and deskewing,
perhaps with a plugin architecture so we can substitute our own
algorithms for any particular task? Clear developer docs would help.
Thanks.

Discussion

JetsoftDev.com - 2006-09-16

Logged In: YES
user_id=1599597

I think the first order of business would be for someone
to go through the code and thoroughly comment it. There
are sparse comment in some place and virtually none in
other.

I think baselineing and deskewing could be done at
function level.

You could always deskew the document before sending to
tesseract.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Filip Gieszczykiewicz - 2006-10-19

Logged In: YES
user_id=37894

I uploaded results of my own digging to the docs section. It
contains a list of all the files and what they do as well as
a list of all the variables.

BTW, the latter is nice for getting a feeling for which
algorithms they're using :-) which is why there's code in
place to strip that info out when sending the code to
"nevada" (beta-testing?)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Modularity

Commercial quality OCR.

Group

Searches

Help

#5 Modularity

Discussion