Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#2 Tesseract crashed in edge_char_of at dawg.cpp:56

open
nobody
None
5
2006-08-26
2006-08-26
Anonymous
No

Tesseract crashed on a specific file. After rebuilding
it with --enable-debug I ran gdb on it:

Starting program: /tmp/tesseract-1.0/tesseract test.tif
test batch
Reading symbols from shared object read from target
memory...done.
Loaded system supplied DSO at 0x4f0a4000
Tesseract Open Source OCR Engine

Program received signal SIGSEGV, Segmentation fault.
0x08102a8e in edge_char_of (dawg=0xb7f3d008,
node=143000, character=105,
word_end=0) at dawg.cpp:56
56 if (edge_occupied (dawg, edge)) {
(gdb) bt
#0 0x08102a8e in edge_char_of (dawg=0xb7f3d008,
node=143000, character=105,
word_end=0) at dawg.cpp:56
#1 0x08102f56 in letter_is_okay (dawg=0xb7f3d008,
node=0xbfdf8cf4,
char_index=3, prevchar=0 '\0', word=0xbfdf8f6b
"DudI", word_end=0)
at dawg.cpp:145
#2 0x080fa781 in append_next_choice (dawg=0xb7f3d008,
node=143000,
permuter=5 '\005', word=0xbfdf8f6b "DudI",
choices=0x8d67360,
char_index=3, this_choice=0x8d3e448, prevchar=0
'\0', limit=0xbfdf8f94,
rating=8.22761822, certainty=-2.4420526,
rating_array=0xbfdf8e20,
certainty_array=0xbfdf8ec4, word_ending=0,
last_word=0, result=0xbfdf8d84)
at permdawg.cpp:188
#3 0x080fabaf in dawg_permute (dawg=0xb7f3d008,
node=143000,
permuter=5 '\005', choices=0x8d67360, char_index=3,
limit=0xbfdf8f94,
word=0xbfdf8f6b "DudI", rating=0, certainty=0,
rating_array=0xbfdf8e20,
certainty_array=0xbfdf8ec4, last_word=0) at
permdawg.cpp:256
#4 0x080fad82 in dawg_permute_and_select
(string=0x815fade "system words:",
dawg=0xb7f3d008, permuter=5 '\005',
character_choices=0x8d67360,
best_choice=0x8d3e498, system_words=1) at
permdawg.cpp:306
#5 0x080fc522 in permute_words
(char_choices=0x8d67360, rating_limit=1000)
at permute.cpp:1542
#6 0x080fda0f in permute_all (char_choices=0x8d67360,
rating_limit=1000,
raw_choice=0xbfdf91bc) at permute.cpp:1046
#7 0x080fdfc2 in permute_characters
(char_choices=0x8d67360, limit=1000,
best_choice=0xbfdf91cc, raw_choice=0xbfdf91bc) at
permute.cpp:1099
#8 0x080d95bd in chop_word_main (word=0x8d2ea28, fx=1,
best_choice=0xbfdf91cc, raw_choice=0xbfdf91bc,
tester=0 '\0',
trainer=0 '\0') at chopper.cpp:436
#9 0x080d744d in cc_recog (tessword=0x8d2ea28,
best_choice=0xbfdf91cc,
best_raw_choice=0xbfdf91bc, tester=0 '\0',
trainer=0 '\0') at tface.cpp:242
#10 0x08070920 in recog_word_recursive (word=0x8d35a78,
denorm=0x8d2e964,
matcher=0x806f860 <tess_default_matcher(PBLOB*,
PBLOB*, PBLOB*, WERD*, DENORM*, BLOB_CHOICE_LIST&)>,
tester=0, trainer=0, testing=0 '\0',
raw_choice=@0x8d2e98c, blob_choices=0xbfdf9308,
outword=@0x8d2e960)
at tfacepp.cpp:165
#11 0x080712e2 in recog_word (word=0x8d35a78,
denorm=0x8d2e964,
matcher=0x806f860 <tess_default_matcher(PBLOB*,
PBLOB*, PBLOB*, WERD*, DENORM*, BLOB_CHOICE_LIST&)>,
tester=0, trainer=0, testing=0 '\0',
raw_choice=@0x8d2e98c, blob_choices=0xbfdf9308,
outword=@0x8d2e960)
at tfacepp.cpp:74
#12 0x0806fc59 in tess_segment_pass2 (word=0x8d35a78,
denorm=0x8d2e964,
matcher=0x806f860 <tess_default_matcher(PBLOB*,
PBLOB*, PBLOB*, WERD*, DENORM*, BLOB_CHOICE_LIST&)>,
raw_choice=@0x8d2e98c, blob_choices=0xbfdf9308,
outword=@0x8d2e960) at tessbox.cpp:95
#13 0x08053ba4 in match_word_pass2 (word=0x8d2e958,
row=0x8c1ea50, x_height=22)
at control.cpp:859
#14 0x080542f3 in classify_word_pass2 (word=0x8d2e958,
row=0x8c1ea50)
at control.cpp:663
#15 0x08055bd6 in recog_all_words (page_res=0xbfdf95a4,
monitor=0x0)
at control.cpp:355
#16 0x0804bb6c in recognize_page
(image_name=@0xbfdf95fc) at tessedit.cpp:159
#17 0x0804a9eb in main (argc=4, argv=0xbfdf96b4) at
tesseractmain.cpp:93

I reduced the .tif to contain only the words that seem
to cause the crash.

Discussion

  • tesseract crashes with this file

     
    Attachments
  • Logged In: YES
    user_id=37894
    Originator: NO

    The submitter's test.tif contains exactly:
    +----------------------------+
    | Dud-|
    |ley Observatory |
    +----------------------------+
    and sure enough it crashes, however this problem can be reduced to just five letters: three letters followed by a hyphen on first line a fourth letter, a space, and fifth letter on the second line.

    This is a puzzling fault: it's triggered only by some combinations of letters and case matters equally weirdly: Case does NOT matter for combinations that don't trigger the fault (ex: no case-variation of A, B, & K crashes) but it DOES matter for letters that do crash (ex: the ONLY combinations of B, E, & Q that DID crash were: beb E q, beb E Q, beB e q, beB E Q, bEb e q, bEb E q, bEb E Q, bEB E Q, Beb e Q, BEb e q, & BEB E Q)

    Noting that the combination "Beb e Q" matches the provided test.tif, I let my PC do some crunching (3 nested for loops from A to Z running each combination through tesseract :-) and the following letter-combinations cause tesseract to crash:

    1 = B, D, E, [G - J], [L - P], R, U, W, Z
    2 = [B - E], G, H, J, [L - R], U, [W - Z]
    3 = [A - Z]

    I attached three files: a) the partial set that causes faults, b) a gdb trace of trigger.txt which contains exactly:
    +------+
    | Byb- |
    | y Q |
    +------+
    (Created trigger.tiff with: "cat trigger.txt | pbmtext -font testing/2helvR18.bdf | pgmtopbm | pnmtotiff > trigger.tiff"), and the trigger.tiff itself.

    In my opinion, this is a logic fault or programming error. My hardware is a speedy Athlon under Fedora Core 6 (stock) - nothing fancy.

     
  • Logged In: YES
    user_id=37894
    Originator: NO

    Clarification, see the attached file "DUDLEY_fault.txt" for explanation of 1,2, and 3. Quicky, the numbers:

    1 = B, D, E, [G - J], [L - P], R, U, W, Z
    2 = [B - E], G, H, J, [L - R], U, [W - Z]
    3 = [A - Z]

    Refer to places where the letters were placed and caused the fault.

    +------+ +------+
    | 121- |---+--+ | Byb- |
    | 2 3 |-----------+ | y Q |
    +------+ | | | +------+
    v v v ^
    1 2 3 |
    Faults for B, E, & Q: |
    Faults for B, G, & Q: |
    Faults for B, H, & Q: |
    Faults for B, P, & Q: |
    Faults for B, Q, & Q: |
    Faults for B, Y, & Q:---------+ for example
    [...]

     
  • Logged In: YES
    user_id=37894
    Originator: NO

    Can't attach files here so I put them under item 1633726 in Tracker->Patches.

    Hope this helps Mr. Smith :-) The bug seems to be real and will likely show up
    again when tesseract gains a wider audience. i.e., it will need to be tracked down
    and squashed but since it's in DAWG and its ilk, I won't be its squasher :-)

    Cheers,
    File

     
  • J. Caldwell
    J. Caldwell
    2009-11-01

    Program received signal SIGSEGV, Segmentation fault.
    edge_char_of (dawg=0x7ffff7d89010, node=247836, character=45, word_end=0)
    at dawg.cpp:63
    63 if (edge_occupied (dawg, edge)) {

    I'll attach files if I can.