Menu

#38 Allow usage of own dictionary

4.X
pending
nobody
rfe (1)
2018-10-21
2018-02-02
No

I was reading custom configuration of tesseact to allow custom words to be recognized
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#config-files-and-augmenting-with-user-data

Could be possible to support own dictionary by Capture2Text, so we could even more improve results?

Thanks in advance

Discussion

  • Pavol Brilla

    Pavol Brilla - 2018-02-02

    ok so I made made my eng.user-words, put it to tessdata, loaded custom tesseract config, all fine, but seems that capture box disappeared ( I can capture through shortcut, but I dont see overlay box )

     
  • cb4960

    cb4960 - 2018-04-20
    • status: open --> pending
    • assigned_to: Christopher Brochtrup --> nobody
     
  • Giacomo Cocchella

    More or less, I've the same request. In my case, I need to recognize keywords and nicknames, not directly related to a language. Capture2Text is qute good to recognize those words fine, but sometimes is wrong. For example, if the word it Tany65, sometimes is wrong... it decodes Tanyb5. It's just an example... So, if I could create a dictionary with all the nicknames and keywords to be recognized, since these words are without meaning, probably this may help Capture2Text to improve its performance. In my case, I have to detect more or less 250 words without meaning. The best could be if I may create a new OCR language. Otherwise, a dictionary. Do you think it's possible?

     
  • Giacomo Cocchella

    Searching for a way to match words to keywords, a way to do that could be using the Levenshtein algorythm https://en.wikipedia.org/wiki/Levenshtein_distance

     

Log in to post a comment.