gscan2pdf progress bar disappears before processing ends
Brought to you by:
ra28145
When multiple pages are being processed (scanned, OCRed, etc.), the progress bar at the bottom completes and is removed 1-2 tasks PRIOR to when processing is actually complete, actual completion being indicated by the end of processor activity, several seconds later. It is disconcerting to have the progress bar end early, because it becomes uncertain when it is safe to initiate the Save dialog. For now, I just wait for the processor activity to end before I save. gscan2pdf did not behave this way a few years ago (sorry, I don't know in which version it changed).
Fedora 31, gscan2pdf 2.6.4 (the Fedora distributed version). I mostly scan with a
Fujitsu iX500, though occasionally with an HP 8720.
I noticed this, too, and I fixed it in v2.6.5. Please test it (perhaps from source) and let me know whether it fixes things for you, too.
I have updated to 2.6.5 from Fedora "updates-testing". Using Fujitsu iX500.
I see NO change in behavior. The test case is to scan multiple pages with dense text (5-7 pages is sufficient), with OCR/Tesseract enabled. The Fujitsu scans fast and gets well ahead of Tesseract. As OCR progresses, the progress bar advances to the right as pages are processed, and then disappears, while the last 2 pages are still processing. My computer (quad core, hyperthreaded) continues with heavy processing for an additional 7 sec, while gscan2pdf shows the next-to-last page selected (in the left-most page pane), and then a few seconds later the last page is selected. In case it helps, I've attached a log file for this test case.
Last edit: auv-ee 2020-03-11
Now I know what you are talking about :-)
There are a couple of effects here:
In an attempt to make the GUI more responsive, it doesn't block whilst drawing the OCR text. This means that the draw process takes several seconds, meaning that it still draws for a couple of seconds after the tesseract process has finished.
Progress bar counts from 0 to n-1, i.e. it disappears when it reaches n, and therefore might look as though it skips the last step.
Does the progress bar get to n-1? If not, does the attached patch solve the problem?
Hmmm... Scanning 7 pages. version 2.6.5, unmodified.
It starts with 2 progress bars, one for scanning, one for everything else, I guess. That's nice; I like that. The first appearance of the second progress bar displays "Process 1 of 1 (to-png)". After scanning completes, it settles on one bar, showing "n of m" tasks, with m gradually increasing from to 12, and n chasing m. Note that the progress bar sometimes shows just "Process n of m" without a named task, sometimes "...n of m (to-png)", and sometimes "...n of m (tesseract)" (I'm not asking for any other processing). Toward the end it displays "11 of 12 (tesseract)" for a couple of seconds, then "12 of 12 (tesseract)" for a couple more seconds, and then disappears. At that point the processor continues to run hard (multiple cores for first few seconds, then one core for the last few seconds, for a total of about 7 seconds after the bar disappears. During these 7 seconds, the left-most pane, showing the page thumbnails, shows the next-to-last, and then the last thumbnail highlighted (selected); I'm not sure what that means. Is it highlighting the page it is rendering or the page for which OCR is still running? I'm guessing it's the page it is rendering, based on your last response, and the fact that the bar disappears AFTER it displays "12 of 12 (tesseract)" for a few seconds. The progress bar never behaves as 0 to m-1, but always as 1 to m.
So maybe the bar disappears when Tesseract is done and the file can be safely saved, and the additional processing/highlighting is only with respect to rendering. That would be fine, if somehow that state was indicated. Maybe just a statement in the help file that says it's safe to save when the bar disappears, or maybe block the save function until all pending operations are complete (OCR, Cleanup, Threshold, etc.). I can't find any mention of the progress bar in the help file.
I tried saving in the midst of processing the 7 pages, and what I got was 5 pages with OCR complete. On a second attempt, all 7 pages were saved, but the last two had no OCR text. On a third attempt, 4 pages saved, with OCR for 3. A second save after all processing, has 7 pages with OCR for all pages. All of these initial saves (not counting the one after all processing) were done while the progress bar was still displayed. If I save with the result that all pages are scanned and saved, but not all pages have completed OCR (as in the second trial), then when I try to Clear/Exit, I get a warning that not all pages have been saved; that is reassuring.
Attempting to save in the 7 sec interval after the bar disappears and before the end of processing is not feasible, because the application is not very responsive in that interval (busy rendering, I guess).
I will next consider the patch, but that requires preparing to compile from source.
This is Perl. No compilation. You can just run it from the source tree. Easiest is to download the source for 2.6.5 and untar it with:
Download the patch into the same directory. Patch the source with:
Then start gscan2pdf from the source tree with:
However, if the progress bar already reaches the end, then I don't think it will help.
In your case, I really assume that simply rendering the OCR is what is using the CPU after the progress bar disappears.
Trying to compile source, first without the patch:
Tests hang at:
t/1113_save_pdf_with_error.t
With "ps auxww" showing:
root 184889 0.2 1.6 445020 129272 pts/1 Sl+ 13:20 0:02 /usr/bin/perl
t/1113_save_pdf_with_error.t
That is: sleeping waiting for an event. If I kill that task, it then hangs at:
t/126_save_djvu_with_error.t
kill, then hang at:
t/133_save_tiff_with_error.t
then same at each of the following:
t/1602_import_DjVu_with_error.t
t/1612_import_TIFF_with_error.t
t/1626_import_PDF_with_error.t
t/1632_import_ppm_with_error.t
t/213_rotate_with_error.t
t/243_threshold_with_error.t
t/253_negate_with_error.t
t/263_unsharp_mask_with_error.t
t/273_crop_with_error.t
t/283_to_png_with_error.t
t/354_unpaper_with_error.t
t/377_user_defined_with_error.t
t/414_tesseract_with_error.t
t/434_gocr_with_error.t
There is obviously a pattern here. A few of the other tests ending in with_error.t did not hang, but complained of failing return values. It's likely I am missing some dependency.
This version comiled from source does seem to run, although at startup a popup window says I'm missing pdftk.
I applied the patch, and executed:
make
without "test". The output showed skipping lots of unaltered .pm files, and ended with:
cp lib/Gscan2pdf/Document.pm blib/lib/Gscan2pdf/Document.pm
While running with this pated version, I cannot detect any change; it seems to still behave as described in my previous response.
This is Perl - you can't compile it.
The tests require extra dependencies, so it is quite possible that they will fail if you have not installed them.
Won't help here. To run from the source tree:
Well, make did perform the cp to blib, if that helps.
I reexecuted with PERL5LIB="blib:blib/arch:lib:$PERL5LIB" perl bin/gscan2pdf --log=log, and the result is the same, as you suspected: the behavior is the same as without the patch.
I think at this point, what makes sense is to say something in the help file about the meaning of the progress bar.
Thanks for your attention.