Can we separate out the various steps like baselining and deskewing,
perhaps with a plugin architecture so we can substitute our own
algorithms for any particular task? Clear developer docs would help.
Thanks.
I think the first order of business would be for someone
to go through the code and thoroughly comment it. There
are sparse comment in some place and virtually none in
other.
I think baselineing and deskewing could be done at
function level.
You could always deskew the document before sending to
tesseract.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I uploaded results of my own digging to the docs section. It
contains a list of all the files and what they do as well as
a list of all the variables.
BTW, the latter is nice for getting a feeling for which
algorithms they're using :-) which is why there's code in
place to strip that info out when sending the code to
"nevada" (beta-testing?)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: YES
user_id=1599597
I think the first order of business would be for someone
to go through the code and thoroughly comment it. There
are sparse comment in some place and virtually none in
other.
I think baselineing and deskewing could be done at
function level.
You could always deskew the document before sending to
tesseract.
Logged In: YES
user_id=37894
I uploaded results of my own digging to the docs section. It
contains a list of all the files and what they do as well as
a list of all the variables.
BTW, the latter is nice for getting a feeling for which
algorithms they're using :-) which is why there's code in
place to strip that info out when sending the code to
"nevada" (beta-testing?)