Tesseract OCR / Discussion / Open Discussion: Docs for v1.03 + doxyfied source

Filip Gieszczykiewicz - 2007-03-01

You can access the doxygen-docs for tesseract v1.03 at:

http://tesseract-ocr.repairfaq.org/

* most of my progress has been on edge detection
* also the 'objects' found in tesseract. See the glossary:
http://tesseract-ocr.repairfaq.org/tess_glossary.html
* added a todo list of my hackings/questions/ask-Ray-isms :)
http://tesseract-ocr.repairfaq.org/todo.html
* doxyfied the training folder
* Also see configure.patch (do it manually)

The source are at:
http://tesseract-ocr.repairfaq.org/downloads/tesseract-1.03f3.tar.gz (6.4MB)
(which allows you to do the TEXT_VERBOSE tracing and includes the
whole testing/ folder - you can regenerate the docs I put up on
the website with JUST this file)

If you don't want to play with doxygen/sources, you can get the docs:
http://tesseract-ocr.repairfaq.org/downloads/docs_1.03f3.tar.gz
(WARNING: It's 12MB because it includes the whole docs directory)

Not sure what use these will be but I built the XML & RTF versions that
doxygen supports:
http://tesseract-ocr.repairfaq.org/downloads/tessRTF03f3.tar.gz (3MB)
http://tesseract-ocr.repairfaq.org/downloads/tessXML03f3.tar.gz (6MB)

Cheers,
Fil

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- aunghtain - 2007-04-04
  
  Hi Fil,
  Thanks for your documentation. I'm trying to figure out how Tesseract works. It looks to me that instructions to see how Tesseract works are for V1.02. The instruction talks about defining TEXT_PROGRESS and TEXT_VERBOSE. But in V1.03 source, I could not find any references to them. Could you tell me how to define them in V1.03?
  
  Thanks
  -will
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Filip Gieszczykiewicz - 2007-04-09
  
  Since the move to google, I have been busy with other projects. I don't believe that any of my changes will get integrated into the main release. That means that you will need to download my patched versions:
  
  http://tesseract-ocr.repairfaq.org/downloads/tesseract-1.03f3.tar.gz
  
  and build that. This already has the tracing stuff in-place. See the configure script
  for more info - search for VERBOSE.
  
  The docs are for v 1.03 and also include just simple doxyfication of the new training directory. I am waiting for the new release of OCRopus (http://code.google.com/p/ocropus/) which will include tesseract as the OCR engine. I will see how much the code has changed before I decide if the doxyfied docs can be merged in.
  
  It looks like Ray is actively working to on tesseract in preparation for integration with ocropus, see, for example,
  http://code.google.com/p/tesseract-ocr/issues/detail?id=18&can=2&q=
  
  Cheers,
  Fil
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Filip Gieszczykiewicz - 2007-04-22
  
  Oops, I had 2 versions of the docs (1.02 and 1.03). I put a redirect in place of the
  old version. BTW, be sure to get ocropus, it really extends tesseract noticably (esp.
  in layout analysis).
  
  See: http://code.google.com/p/ocropus/
  
  Cheers,
  Fil
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Docs for v1.03 + doxyfied source

Commercial quality OCR.

Forums

Help

Docs for v1.03 + doxyfied source

Docs for v1.03 + doxyfied source

Commercial quality OCR.

Forums

Help

Docs for v1.03 + doxyfied source document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Docs for v1.03 + doxyfied source