Tesseract OCR / Feature Requests / #9 Layout recognition and modular scanning framework

Werner Höhrer - 2006-12-30

Version 1.0 (2006-12-30)

scan_layout_parser_requirements.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Filip Gieszczykiewicz - 2007-01-18

Logged In: YES
user_id=37894
Originator: NO

Well, you don't have to be a skilled C++ programmer to make a stab at it - if that disqualified one from helping out I should leave :-)

Get libtiff and look through it. Get libpng and look through it. Look at how tesseractmain.cpp (in ccmain/) uses libtiff and add an #ifdef blah, #else, #endif
around that and try to get it to read PNG. I don't think that would be hard.

Form recognition is NOT something someone can just "hack". That requires a decent review of literature and decent programming skills - say, a grad student looking for a nice Masters, using tess as a building block :-) Zoning that WORKS takes brains.

Modularity you can forget about. The best you can hope for is for tess to remain an
"engine" and not tied to some all-inclusive GUI system... ;-) [inside joke for now]

Barcode recognition has to wait until zoning... OR until someone takes gocr and "extracts" its barcode support and integrates it into tess.

Cheers,
Fil

P.S. Also, please realize that for most sourceforgers, these projects are a past-time hobby. I hope that when the tess website comes on-line, you can convert the PDF file into more of a road-map with an analysis of existing GNU tools "plotted" on it. That would give folks looking for something to hack on an excellent "overview" of what works and what does not and what needs more work/rework to improve.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Höhrer - 2007-01-19

Logged In: YES
user_id=1434318
Originator: YES

I actually know quite well how sourceforge projects work since I'm already involved in some in my spare time next to work ... which is also taking up most of what is left of it, so i just summed up my thoughts on this (and I'm sure most of it has been suggested before a hundred times). Use it or not, it's up to the one whom it might help. :)

> Well, you don't have to be a skilled C++ programmer to make a stab at it -
> if that disqualified one from helping out I should leave :-)
> [SNIP]
> Form recognition is NOT something someone can just "hack". That requires a
> decent review of literature and decent programming skills - say, a grad
> student looking for a nice Masters, using tess as a building block :-)
> Zoning that WORKS takes brains.

What now? "Not a skilled programmer" or "decent programming skills" ;) But I know what you mean, don't worry :)

Re on modular approach (and general):
I specifically mentioned that this is not centered on Tesseract (but where else to post this?).
It's actually as you said: Tesseract (or whatever OCR software) would only be used as an (text) engine inside the (new?) framework, that's exactly what I'm talking about. Modification of an engine to make it into a modular framework is not really a good approach anyway.
One thing feed to other ImageReader->LayoutRecognition->Text/table/whatever recognition

Image conversion:
Hacking different image formats into Tesseract is not really a good. Using a generic framework to feed the Tesseract (the engine) a unified image format would be the better approach.

barcodes:
Same is true here - Extracting barcode support from gocr is only a good idea if it doesn't work out in gocr (so why extract it and integrated it in another ocr program in the first place?) - I rather think is should be used as a lib as well. If it can be used in such a way already - even better better.

GUI:
GUI? What a joke ;) that would be the last thing on my list .. if there is such a libary/framework as in my summary it is (should be) easily integrated in whatever UI you like. (e.g. the GNOME and KDE guys are quite good at this)

Werner

PS: If s/b wants the source of the pdf (openoffice draw) some time just ask.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Layout recognition and modular scanning framework

Commercial quality OCR.

Group

Searches

Help

#9 Layout recognition and modular scanning framework

Discussion