Allow usage of own dictionary
Quickly OCR part of the screen and save resulting text to clipboard
Brought to you by:
cb4960
I was reading custom configuration of tesseact to allow custom words to be recognized
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#config-files-and-augmenting-with-user-data
Could be possible to support own dictionary by Capture2Text, so we could even more improve results?
Thanks in advance
ok so I made made my eng.user-words, put it to tessdata, loaded custom tesseract config, all fine, but seems that capture box disappeared ( I can capture through shortcut, but I dont see overlay box )
More or less, I've the same request. In my case, I need to recognize keywords and nicknames, not directly related to a language. Capture2Text is qute good to recognize those words fine, but sometimes is wrong. For example, if the word it Tany65, sometimes is wrong... it decodes Tanyb5. It's just an example... So, if I could create a dictionary with all the nicknames and keywords to be recognized, since these words are without meaning, probably this may help Capture2Text to improve its performance. In my case, I have to detect more or less 250 words without meaning. The best could be if I may create a new OCR language. Otherwise, a dictionary. Do you think it's possible?
Searching for a way to match words to keywords, a way to do that could be using the Levenshtein algorythm https://en.wikipedia.org/wiki/Levenshtein_distance