Get text together with coordinates information

2013-06-05
2013-06-07
  • Dmitry Katsubo

    Dmitry Katsubo - 2013-06-05

    I am not sure if that is available via Tesseract C++ API. However it would be nice to have a possibility to iterate over the page segments and get rect coordinates together with recognized text within that rect.

    For example, I want to disregard headers/footers during OCR process. I can estimate header/footer location of the page (plus analysing the text which is typical), but for that I need rect coordinates. When headers/footers are incorporated onto output string it is very difficult to cut them away.

    Another scenario: I have some text images with line number column (i.e. all text lines are followed by a number / line-counter). Visually it is very easy to cut this column away, but again hardly possible to remove line numbers from resulting text without danger to remove numbers from actual text.

     
  • Quan Nguyen

    Quan Nguyen - 2013-06-07

    Using ResultIterator, you can have the block/para/word/symbol(character) coordinates. See the unit test cases for example.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks