Menu

#27 MultiPNM + database

closed-accepted
nobody
None
5
2010-02-26
2010-02-23
Anonymous
No

GOCR Version: 0.48 (Windows precompiled binary)
Environment: Windows XP

This bug seems to relate to the reuse of the static file pointer f1 in the function readpgm, in the #ELSE clause of #IFDEF HAVE_PAM_H in pnm.c. When a multi-image PBM file is provided as input to GOCR, readpgm leaves this file open, anticipating future iterations for the remaining parts of the file. Loading a character database, however, interferes with this plan, as load_db also calls readpgm, reading and closing f1 prematurely.

When GOCR is run using a custom database and an input file of a valid multi-image PBM (i.e. two PBM images concatenated together), GOCR goes into an infinite loop matching the first image, but with the first entry in the database not recognized. It seems that the second image is erroneously read as a database entry and the multi-image file closed, so that the next iteration of the main read_picture loop re-reads the input file from the start, with a positive multipnm result from readpgm, causing the loop to repeat indefinitely.

When run with an input file of an invalid multi-image PBM (one valid PBM followed by the mangled partial header "P^"), GOCR produces no OCR output, but reports errors loading database files. This appears to confirm that load_db (and not read_picture) encounters the mangled header in the input file.

The attached zip file contains sample images and database that illustrate this bug.

Discussion

  • Nobody/Anonymous

    Example images and database

     
  • Joerg Schulenburg

    will be fixed in next version 0.49, thanks

     
  • Joerg Schulenburg

    • status: open --> closed-accepted
     

Log in to post a comment.