Capture2Text / Tickets / #135 My gratitude, feature requests, and questions

#135 My gratitude, feature requests, and questions

Milestone: 4.6.2

Status: open

Owner: nobody

Labels: None

Updated: 2020-02-15

Created: 2020-02-08

Creator: Random Person

Private: No

First off, I'd like to thank the developer for all the effort they've put into this program, it is greatly appreciated. I've mainly been using capture2text for a rough translation of manga and doujinshi from Japanese into English, and while it works well in many cases it also struggles under quite a few circumstances. There are several features I personally believe would be extremely helpful, though I am aware that they're likely harder to program than I imagine. First, capture2text struggles greatly with text on colored backgrounds, especially when the text is something other than black. I was wondering if it would be possible to have a setting for the user to pick a color, either through a built in color picker tool or even just a box to input (r,g,b) values, where capture2text would then treat the selected color as black (0,0,0) and treat all other colors as white. If possible this would greatly reduce or eliminate the image manipulation currently necessary to ocr colored text on a colored background. Next I noticed (like other users) that there seems to be a limitation on in-application translation, where the popup ceases to have a box with translated results after a certain number of translations in a given time frame. I may be wrong, but based on cursory research I believe this is caused by quotas/limits for the google cloud translation api key. If this is indeed the case I was wondering if it would be possible to have capture2text switch to a different translation service api key (I believe there are several, including a few free ones) while google translate does not accept any more translation requests from the client. If possible this would give users the option to continue in-application translation using a different service rather than choosing between waiting for google translate to be available in capture2text again or manually pasting the ocr'd text into a translation service like google translate in the browser. Lastly, and of greatest import in my personal opinion, capture2text often has enormous difficulty correctly identifying Japanese characters which are written very slightly different compared to the standard font. Even as a native English speaker with essentially no knowledge of Japanese and kanji I am almost always able to easily recognize the character while capture2text will take many dozens of attempts with image manipulation, scaling, and other tweaks to eventually recognize the correct character, if it ever does. This is immensely frustrating as it requires manually utilizing a virtual keyboard to input the correct characters every time they appear in the text I ocr which depending on the specific quirks in the text may be quite often. It would be incredibly useful to be able to ocr a character and then manually equate it to the proper character. Essentially being able to select the image of a character ocr'd by capture2text as a character to be replaced automatically as in the "Replace" tab in settings would allow the user to manually identify a character written a certain way once and then receive the proper result every subsequent ocr, saving vast amounts of time and effort. Next I have a few questions regarding the functioning of capture2text. First, I have noticed that with the vast majority of my ocr attempts using the capture box gives more accurate results compared to using text line capture. Additionally, when using the capture box accuracy suffers tremendously when I capture multiple lines in the box simultaneously. I was wondering if there was any way to mitigate this so that I could use text line capture or capture multiple lines of text at once without compromising ocr accuracy. I am also somewhat curious about methods to determine the optimum preprocessing and scale factor with any given image to improve ocr accuracy with capture2text. I generally ocr with the image at its original size (100% zoom) and apply lanczos (softer) smoothing and sharpening depending upon text appearance, aliasing, etc. Scale factor, to my knowledge, simply enlarges the image while not actually improving the image quality, however I've found it having unpredictable impacts on ocr accuracy when changed depending upon the source image. I'm aware in some cases (particularly with small images) enlarging the image improves ocr accuracy. For 1920 x 1080 screen is there any general guideline for when to enlarge an image and by how much? Also, is it best to use capture2texts built in scale factor setting, or should I use the built in lanczos or bicubic interpolation in my image viewing software? That's all from me at the moment, thank you again for your work, and I'm hoping that the 4 months since your accounts last action isn't an indication of this project being dropped.

Discussion

alexander - 2020-02-15

Hello dear friends.
I also vote for feature that can get OCR only certain color text. maybe with threshold control. It will be very usefull for gamers.

Last edit: alexander 2020-02-15

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

My gratitude, feature requests, and questions

Quickly OCR part of the screen and save resulting text to clipboard

Milestone

Searches

Help

#135 My gratitude, feature requests, and questions

Discussion