gscan2pdf / Bugs / #395 Some suggestions regarding OCR interface/workflow

#395 Some suggestions regarding OCR interface/workflow

Milestone: v1.0_(example)

Status: open

Owner: nobody

Labels: ocr (2)

Priority: 5

Updated: 2022-01-24

Created: 2022-01-19

Creator: FeRD

Private: No

The report to follow will be a sort of catch-all list of feedback and suggestions I had after spending some time working with the OCR functionality in gscan2pdf. Very little (possibly nothing) here is a bug report per se, for the most part the features and tools in question are working as designed.

Please discount anything I write that's based on my ignorance or misunderstanding, feel free to reject anything you disagree with or simply don't feel is worth devoting resources to, and above all I'd encourage you to tell me to shut up once I've exhausted your patience on the topic(s). I probably won't have realized, but I promise I will immediately accept it without complaint.

My comment in #382 regarding the OCR-driven auto-navigation / auto-positioning functionality can be considered part of this list, as it would have been if there hadn't been a discussion already in progress on that issue.

Working with the OCR tools, in terms of UX:

First, it's a minor thing, but extending the correction toolbar's text-input/edit field across the entire width of the toolbar, from the "Go to most confident text" button all the way to the "Add text" button, would be a major convenience. It doesn't matter that it only holds a single word, or how big/tiny that word may be. A click anywhere in the toolbar's center region would be enough to focus the edit field and prepare it to receive keystrokes, instead of having to target a single, often tiny word nestled on one shore of a veritable ocean of non-responsive application chrome. A double-click anywhere in that wide space would select the entire contents, readying it for a quick copy/paste or type-over replacement.
Sometimes, when tesseract isolates a single letter or punctuation mark, actually finding the sliver of an input field in order becomes an adventure in precision mousing.

Zoom response and responsiveness

The main content view's zoom functionality may actually be a little too flexible, and could probably do with a few constraints to protect the user from themselves. Unless it's a useful thing in some way that I'm simply not grasping, there doesn't seem to be any reason to continue zooming out past the "Zoom to fit" level. It would actually be helpful if gscan2pdf stopped at that point and ignored further zoom-out commands, rather than dutifully reducing the page to a vanishingly small speck in the middle of the window.

For me, this came up as part of what I discussed at #382. Because gscan2pdf kept hyper-zooming in on single words, in the OCR view, I found myself frequently zooming out for a wider view. That became the setup for a frustrating sequence of events developed, which I found myself stupidly repeating time and time again:

1. I spin the mousewheel back a bit, to get a wider view of the current text
2. The view refreshes, and I assume it's reached the end result of my mousewheel zoom-out
3. I attempt to interact with the page content as it now appears on screen, only to find the page/text view unresponsive
4. ...*Because* the current view was rendered with the zoom change only partially completed, meaning the main canvas UI immediately went offline as it prepared to redraw with a smaller version of the page view
5. But I don't immediately notice the incomplete zoom (that's an after-the-fact realization), and attempt to click an OCR word. Which leads me to expect that gscan2pdf is about to hyperzoom inward until the word I clicked fills up most of the view
6. That expectation/assumption leads to me preemptively scrolling back a few notches
7. However, I never actually clicked an OCR word in step 5. The page zoom had changed again by the time my click was processed, meaning the word I intended to click was no longer in the same position as my mouse click). gscan2pdf therefore never zoomed in to that word, any my anticipatory second mousewheel zoom took effect _after_ the completion of my previous zoom-out scroll.
8. The end result being that the entire page has shrunk to the size of a microSD card on the screen, and I've completely lost my bearings.

Yes, that's a dumb sequence of events that's entirely my fault, and the best solution would be for me to just be less impatient in how I interact with the interface. But gscan2pdf could also do more to assist with that, and it could do less at times when it'll prevent lag due to storms of events that start piling up in unexpected or confusing ways. Ignoring additional zoom-out events once it's reached the zoom-to-fit level where the entire page is visible is a first, most obvious way to address this. Having the option to set a zoom level that gscan2pdf won't automatically change, like I described in #382, would be another.

Bounding boxes in the OCR text view

This is another "unless this is useful in some way I'm just not seeing" thing. Which is certainly possible. But I've tried to think of reasons why it would be useful to know the exact outline of tesseract's recognition area for an individual text element... and I can't think of any scenario where that information is useful to me.

But whether or not the boxes are useful, they're unquestionably distracting. And there are numerous times where they partially obscure their contents, making it more difficult to read the next. I uploaded a few examples to Imgur. These four words are all more difficult to make out clearly, in varying degrees, simply due to the box outline drawn around them:

https://imgur.com/a/4N9yAVA

The last one is a great example of a worst-case scenario, or close to it. The text in the box is "ITG", the original printed text on the page was "JTG", and that all would've been so, so much clearer without that box in the way. If there's some reason the box needs to be drawn, the only thing I can suggest is making it radically less obtrusive so that it won't obscure the text inside. But before making any design changes it really feels worth considering whether the box needs to be shown, period.

Discussion

Jeffrey Ratcliffe - 2022-01-24

You make a load of fair points. I'll split them out into separate bugs reports.

I find the bounding boxes useful in the case where font for the OCR text is so different from that used in the original image that the text is not position where I would expect.

The main points are:

extend text-input/edit field width

option to prevent scrolling out further than zoom to fit

option to toggle the bounding boxes

option limit/adjust the zoom as discussed in bug #382

in general make the OCR text rendering quicker and much more responsive (not easy)

Does that cover things?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- FeRD - 2022-01-24
  
  Does that cover things?
  
  I think so, yeah. Sorry about lumping everything together.
  
  I definitely appreciate the difficulties inherent in #5, believe me, that's why I didn't even really suggest it per se — obviously, it would always be nice if everything is quicker and more responsive, so it feels barely constructive to even bother pointing that out. That's why I tried to focus on fixes that could minimize the impact of rendering lag on the UX, and make it easier to live with.
  
  On that front, I've been experimenting with the zoom responsiveness a bit more and I have some ideas... which I'll open a separate report for, because they're not even directly related to any of the others here so far (except #5 which is mostly implicit).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Some suggestions regarding OCR interface/workflow

Group

Searches

Help

#395 Some suggestions regarding OCR interface/workflow

Zoom response and responsiveness

Bounding boxes in the OCR text view

Discussion

Some suggestions regarding OCR interface/workflow

Group

Searches

Help

#395 Some suggestions regarding OCR interface/workflow

Correction toolbar input

Zoom response and responsiveness

Bounding boxes in the OCR text view

Discussion