Docs for v1.03 + doxyfied source

2007-03-01
2013-04-25
  • Filip Gieszczykiewicz

    You can access the doxygen-docs for tesseract v1.03 at:

    http://tesseract-ocr.repairfaq.org/

    * most of my progress has been on edge detection
    * also the 'objects' found in tesseract. See the glossary:
    http://tesseract-ocr.repairfaq.org/tess_glossary.html
    * added a todo list of my hackings/questions/ask-Ray-isms :)
    http://tesseract-ocr.repairfaq.org/todo.html
    * doxyfied the training folder
    * Also see configure.patch (do it manually)

    The source are at:
    http://tesseract-ocr.repairfaq.org/downloads/tesseract-1.03f3.tar.gz (6.4MB)
    (which allows you to do the TEXT_VERBOSE tracing and includes the
    whole testing/ folder - you can regenerate the docs I put up on
    the website with JUST this file)

    If you don't want to play with doxygen/sources, you can get the docs:
    http://tesseract-ocr.repairfaq.org/downloads/docs_1.03f3.tar.gz
    (WARNING: It's 12MB because it includes the whole docs directory)

    Not sure what use these will be but I built the XML & RTF versions that
    doxygen supports:
    http://tesseract-ocr.repairfaq.org/downloads/tessRTF03f3.tar.gz (3MB)
    http://tesseract-ocr.repairfaq.org/downloads/tessXML03f3.tar.gz (6MB)

    Cheers,
    Fil

     
    • aunghtain

      aunghtain - 2007-04-04

      Hi Fil,
            Thanks for your documentation. I'm trying to figure out how Tesseract works. It looks to me that instructions to see how Tesseract works are for V1.02. The instruction talks about defining TEXT_PROGRESS and TEXT_VERBOSE. But in V1.03 source, I could not find any references to them. Could you tell me how to define them in V1.03?

      Thanks
      -will

       
    • Filip Gieszczykiewicz

      Since the move to google, I have been busy with other projects. I don't believe that any of my changes will get integrated into the main release. That means that you will need to download my patched versions:

      http://tesseract-ocr.repairfaq.org/downloads/tesseract-1.03f3.tar.gz

      and build that. This already has the tracing stuff in-place. See the configure script
      for more info - search for VERBOSE.

      The docs are for v 1.03 and also include just simple doxyfication of the new training directory. I am waiting for the new release of OCRopus (http://code.google.com/p/ocropus/) which will include tesseract as the OCR engine. I will see how much the code has changed before I decide if the doxyfied docs can be merged in.

      It looks like Ray is actively working to on tesseract in preparation for integration with ocropus, see, for example,
      http://code.google.com/p/tesseract-ocr/issues/detail?id=18&can=2&q=

      Cheers,
      Fil

       
    • Filip Gieszczykiewicz

      Oops, I had 2 versions of the docs (1.02 and 1.03). I put a redirect in place of the
      old version. BTW, be sure to get ocropus, it really extends tesseract noticably (esp.
      in layout analysis).

      See: http://code.google.com/p/ocropus/

      Cheers,
      Fil

       

Log in to post a comment.