I wrote a really simple .NET wrapper for Tesseract and I'm quite happy that it works great for me. Anyway, I have one small problem, every time it detects a non ASCII character an assertatin failes. I'm not able to stop this assertation, building with NDEBUG doesn't. help. Can anyone please help me with this one. Here is the assertation:
Line: 56/68 (two different assertations actualy)
Expression: (unsigned)(c + 1) <= 256
Thanks a lot.
I have found a minor bug in function def_letter_is_okay() in dawg.cpp
Some times an assertion rises when running the tesseract.exe in Debug
mode. This assertion rises from isalpha() in line 146 of dawg.cpp.
This method gets the input character as integer but the
dummy_word[char_index] returns a char value. If return value is less
than 0x80 everything is ok. But if the char value becomes greater than
0x79, the assertion rises. As you know char is a signed value so
values greater then 0x79 will cast to negative values in isalpha().
And isalpha() interpret it as illegal input. To resolve the problem we
should simply cast the char to unsigned char then to integer. Problem
will be solved.
To remove the bug:
Change the line 146 of dawg.cpp from
isalpha (dummy_word [char_index])
isalpha ((unsigned char)dummy_word [char_index]).