Share

Tesseract OCR

The forum address has changed, you have been automatically redirected. Please update any bookmarks to use the new URL.

Subscribe

Docs for v1.03 + doxyfied source

  1. 2007-03-01 02:47:11 UTC
    You can access the doxygen-docs for tesseract v1.03 at:

    http://tesseract-ocr.repairfaq.org/

    * most of my progress has been on edge detection
    * also the 'objects' found in tesseract. See the glossary:
    http://tesseract-ocr.repairfaq.org/tess_glossary.html
    * added a todo list of my hackings/questions/ask-Ray-isms :)
    http://tesseract-ocr.repairfaq.org/todo.html
    * doxyfied the training folder
    * Also see configure.patch (do it manually)

    The source are at:
    http://tesseract-ocr.repairfaq.org/downloads/tesseract-1.03f3.tar.gz (6.4MB)
    (which allows you to do the TEXT_VERBOSE tracing and includes the
    whole testing/ folder - you can regenerate the docs I put up on
    the website with JUST this file)

    If you don't want to play with doxygen/sources, you can get the docs:
    http://tesseract-ocr.repairfaq.org/downloads/docs_1.03f3.tar.gz
    (WARNING: It's 12MB because it includes the whole docs directory)

    Not sure what use these will be but I built the XML & RTF versions that
    doxygen supports:
    http://tesseract-ocr.repairfaq.org/downloads/tessRTF03f3.tar.gz (3MB)
    http://tesseract-ocr.repairfaq.org/downloads/tessXML03f3.tar.gz (6MB)

    Cheers,
    Fil
  2. 2007-04-04 07:25:28 UTC
    Hi Fil,
    Thanks for your documentation. I'm trying to figure out how Tesseract works. It looks to me that instructions to see how Tesseract works are for V1.02. The instruction talks about defining TEXT_PROGRESS and TEXT_VERBOSE. But in V1.03 source, I could not find any references to them. Could you tell me how to define them in V1.03?

    Thanks
    -will
  3. 2007-04-09 22:47:51 UTC
    Since the move to google, I have been busy with other projects. I don't believe that any of my changes will get integrated into the main release. That means that you will need to download my patched versions:

    http://tesseract-ocr.repairfaq.org/downloads/tesseract-1.03f3.tar.gz

    and build that. This already has the tracing stuff in-place. See the configure script
    for more info - search for VERBOSE.

    The docs are for v 1.03 and also include just simple doxyfication of the new training directory. I am waiting for the new release of OCRopus (http://code.google.com/p/ocropus/) which will include tesseract as the OCR engine. I will see how much the code has changed before I decide if the doxyfied docs can be merged in.

    It looks like Ray is actively working to on tesseract in preparation for integration with ocropus, see, for example,
    http://code.google.com/p/tesseract-ocr/issues/detail?id=18&can=2&q=

    Cheers,
    Fil
  4. 2007-04-22 23:32:07 UTC
    Oops, I had 2 versions of the docs (1.02 and 1.03). I put a redirect in place of the
    old version. BTW, be sure to get ocropus, it really extends tesseract noticably (esp.
    in layout analysis).

    See: http://code.google.com/p/ocropus/

    Cheers,
    Fil
< Previous | 1 | Next >

Add a Reply

This forum does not allow anonymous participation.

Log in to add a reply. Not registered? Create an account to participate and receive email updates when replies are posted to this topic.