Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


Problems with 1.03 - build & garbage out

  • Hmmm... building with --enable-debug died with:

    if g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccstruct -I../ccutil -I../cutil -I../classify -I../image -I../dict -I../viewer   -g -Wall -MT tface.o -MD -MP -MF ".deps/tface.Tpo" -c -o tface.o tface.cpp; \         then mv -f ".deps/tface.Tpo" ".deps/tface.Po"; else rm -f ".deps/tface.Tpo"; exit 1; fi
    ../cutil/globals.h:46: error: previous declaration of 'int optind' with 'C++' linkage
    ../ccutil/getopt.h:23: error: conflicts with new declaration with 'C' linkage ../cutil/globals.h:47: error: previous declaration of 'char* optarg' with 'C++' linkage
    ../ccutil/getopt.h:24: error: conflicts with new declaration with 'C' linkage
    make[3]: *** [tface.o] Error 1
    make[3]: Leaving directory `/workspace/OCR/tesseract-1.03f/wordrec'
    make[2]: *** [all-recursive] Error 1
    make[2]: Leaving directory `/workspace/OCR/tesseract-1.03f/wordrec'
    make[1]: *** [all-recursive] Error 1
    make[1]: Leaving directory `/workspace/OCR/tesseract-1.03f'
    make: *** [all] Error 2

    I just commented out the two offending lines and it finished compiling. However,
    upon building and execution using the provided tif, I got rubbish:

    pmorvxu qo6 jnwbeq oAeL we gas?` ;ox~
    ]F1LUbGq OAGL QJG {SEA {OX` j_}.IG dF1!C}(
    OAGL [{16 {SEA J`OX~ j_}JG ClI'1!C}( pLOMU qo6
    gas?` ;ox~ ipe dngcg pkorvxu qod jnuabeq
    j_}JG ClI'1!C}( pLOMU qo6 ]f1!JJbGq OAGL HJG
    0% HIS J=OHiJ9I~
    OCL COqG *3Uq 266 QJG ![ MOLK2 OU *3}} []xbG2
    J.!J!e !e 9 lot 0% JS bO!U{ IGXI to [Gel {IJG

    I will see what's up but comments are welcome.

    I'm using stock Fedora 6 - I tried 1.02 and it works as expected.

    • Nayfe

      We have not tested enable-debug mode yet, but from 3 of my project mates, 2 of us have same rubbish result with 1.03.
      By my side, i have quite same results as before...

      back to hack :)

    • Oops, I didn't check without enable-debug... will do.

      I did try a few different images (created with pbmtext) and noticed
      that the garbage follows the "general" outline of the text in the
      image - maybe the new grayscale/color -> binary routines are causing


      P.S. Finished merging my docs and text-progress changes into 1.03 and will
      post both by mid-week.

    • On the Help forum, several folks point out that compiling with libtiff gives garbage results:

      "Tesseract 1.03 works if you dont compile with libtiff.
      With libtiff there's just garbage in the output-file."

      Will try without.


    • Ray Smith
      Ray Smith

      I have just replaced the package. It will take time to propagate to the mirrors, but the problem is fixed. The image was upside-down!

    • Tom Buehlmann
      Tom Buehlmann

      Ok, thanks. Perhaps i should have checked the image first..! *shame on me*
      But the Problem remains. I can reproduce the error with "tface.c" on opensuse 10.2 and tesseract 1.03.
      I had successfull compilation on this system, so this may be due to a automatic patch by suse..?
      Does compile on Debian Sarge (3.1), but with libtiff there's still the garbage output.. (Yes, i checked the test image this time.!*g*)
      Also tried Tess 1.02 on both systems with the same result (garbage when using libtiff).
      If i can afford some time, i`ll try to compile using an earlier Version of libtiff.
      I` am not a C / C++ programmer, so unfortunately little chance for me to look deeper into this issue :-(

    • There's a quick bug in the code.

      When TessBaseAPI::TesseractRect() returns NULL because
      width < kMinRectSize || height < kMinRectSize
      that should be tested for in main() otherwise there's
      a fault at line 133:

      129       outfile = argv[2];
      130       outfile += ".txt";
      131       FILE* fp = fopen(outfile.string(), "w");
      132       if (fp != NULL) {
      133         fwrite(text, 1, strlen(text), fp);
      134         fclose(fp);
      135       }
      136       delete [] text;
      137       TessBaseAPI::End();
      139       return 0;                      //Normal exit

      How do I know? I'm trying to feed it an 8 by 15 pixel tif (one (1) letter) :-)
      Works for 18 font but dies for 12.


      P.S. Actually, in ISOLATION, tesseract recognizes about half the letters regardless of the orientation (right-side up or not).

    • Jay Ro
      Jay Ro

      1.03 has a bug when building it with libtiff, see the help forum for "instalation successful but no output".

      Just compile it without libtiff4-dev.

    • Well, I tried it without libtiff and it's not improving. I'm not at home so I only have Solaris (ugh) and a speedy SMP Linux machine. The latter is hanging on EVERY tif image:

      #0  0x080a78ca in compute_line_occupation (block=0x82878b0, gradient=0,
          min_y=25, max_y=327, occupation=0x8283730, deltas=0x8283270)
          at /usr/include/bits/mathinline.h:530
      #1  0x080a7bcc in delete_non_dropout_rows (block=0x82878b0, gradient=0,
          rotation={xcoord = 1, ycoord = 0}, block_edge=0, testing_on=0 '\0')
          at makerow.cpp:644
      #2  0x080aecbb in cleanup_rows (page_tr={xcoord = 819, ycoord = 352},
          block=0x82878b0, gradient=0, rotation={xcoord = 1, ycoord = 0},
          block_edge=0, testing_on=1 '\001') at ../ccutil/varable.h:172
      #3  0x080b0081 in make_rows (page_tr={xcoord = 819, ycoord = 352},
          blocks=0xbffff7f0, land_blocks=0xbffff670, port_blocks=0xbffff660)
          at ../ccstruct/blobbox.h:401
      #4  0x08096d43 in textord_page (page_tr={xcoord = 819, ycoord = 352},
          blocks=0xbffff7f0, land_blocks=0xbffff670, port_blocks=0xbffff660)
          at tordmain.cpp:491
      #5  0x08097558 in edges_and_textord (filename=0x81a7608 "noname.tif",
          blocks=0xbffff7f0) at ../ccstruct/rect.h:80
      #6  0x08088287 in pgeditor_read_file (name=@0xbffff7b0, blocks=0xbffff7f0)
          at ../ccutil/strngs.h:101
      #7  0x0804ad25 in TessBaseAPI::FindLines (block_list=0xbffff7f0)
          at baseapi.cpp:337
      #8  0x0804aeff in TessBaseAPI::TesseractRect (
          imagedata=0x8259918 'ÿ' <repeats 102 times>, "à", 'ÿ' <repeats 97 times>..., bytes_per_pixel=0, bytes_per_line=103, left=0, top=0, width=819, height=352)
          at baseapi.cpp:324
      #9  0x08049e87 in main (argc=3, argv=0xbffff934) at ../image/img.h:119

      /usr/include/bits/mathinline.h:530 happens to be an inline floor() macro. It's
      stuck in there forever.

      Hey... Yup, "-O3" strikes again! Disabling all optimization + no LIBTIFF gets me results again!

      ccmain/tesseract testing/FOX.tif out
      gkTesseract Open Source OCR Engine
      Image has 1 bit  per pixel and size (53,21)
      speedy> cat out.txt

      Yup, it really does work!

      ccmain/tesseract testing/tess_lic.tif out
      gkTesseract Open Source OCR Engine
      Image has 1 bit  per pixel and size (819,352)
      Image is 819 by 352 at 0 bpp and 103 bpl
      speedy> cat out.txt
      This package contains the Tesseract Open Source OCR Engine.
      Orignally developed at Hewlett Packard Laboratories Bristol and
      at Hewlett Packard Co, Greeley Colorado, all the code
      in this distribution is now licensed under the Apache License:
      ** Licensed under the Apache License, Version 2.0 (the "License");
      ** you may not use this file except in compliance with the License.
      ** You may obtain a copy of the License at
      ** http:IIwww.apache.orgIIicensesILICENSE-2.0
      ** Unless required by applicable law or agreed to in writing, software
      ** distributed under the License is distributed on an "AS IS" BASIS,
      ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      ** See the License for the specific language governing permissions and
      ** Iimitations under the License.

      Thanks Ray!