Re: [gscan2pdf-help] scantpaper v3.0.0 released

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

*Hello Jeff, hello All,*

*I’ve been a big fan of gscan2pdf for many years! Thank you so much, Jeff!

For about three years now, I’ve been using gscan2pdf for historical 
research, particularly when working with file formats created in archives.

gscan2pdf is a great tool for historians!

For example, the German Federal Archives makes digitized documents 
available for download on the web in JPG format. For some archival 
materials, several hundred pages need to be processed; for others (more 
rarely), over a thousand pages.
I import these files into Gscan2pdf, use Tesseract to add an OCR layer, 
and save them in PDF format.
In my experience, Gscan2pdf has trouble processing more than 300 pages. 
That’s why I “split” the jobs into batches of 250 to 300 pages—depending 
on the quality of the original—and then “merge” the individual PDF files 
using a PDF editor.

A second example is data sets that you create yourself on-site at a 
state archive using a microfilm reader. A third example is scan 
collections created on-site in the archives using smartphone apps.

GScan2pdf works well for all these purposes.

As of today, I’ve been testing Scantpaper v3.0.0 on a desktop running 
the Debian 13 operating system. Scantpaper was installed using a 
pre-built deb package.

The first thing I noticed was an error message when launching Scantpaper 
from the console:
"Error retrieving scanner options: expected string or bytes-like object, 
got ‘NoneType’"

Subjectively, Scantpaper seems to me to work more slowly than GScan2pdf 
when loading files and performing text recognition.

When loading JPG files, I receive the following error messages for each 
file or page (excerpt for illustrative purposes):*
"WARNING:py.warnings:/usr/lib/python3/dist-packages/gi/_propertyhelper.py:220: 
Warning: value "137" of type 'gint' is invalid or out of range for 
property 'page-number-start' of type 'gint'
   instance.set_property(self.name, value)

WARNING:py.warnings:/usr/lib/python3/dist-packages/gi/_propertyhelper.py:220: 
Warning: value "138" of type 'gint' is invalid or out of range for 
property 'page-number-start' of type 'gint'
   instance.set_property(self.name, value)

WARNING:py.warnings:/usr/lib/python3/dist-packages/gi/_propertyhelper.py:220: 
Warning: value "139" of type 'gint' is invalid or out of range for 
property 'page-number-start' of type 'gint'
   instance.set_property(self.name, value)"

*During the text recognition step with Tesseract, error messages are 
also “normal” for GScan2pdf, since scanned documents from archives are 
often of very poor quality.
When using Scantpaper, I receive the following error messages (excerpt 
for illustrative purposes):*
Estimating resolution as 765
ERROR:tools_menu_mixins:Can't display page with uuid 13: page not found
Estimating resolution as 110
Estimating resolution as 773
ERROR:tools_menu_mixins:Can't display page with uuid 14: page not found
Estimating resolution as 132
Estimating resolution as 774
Estimating resolution as 136
ERROR:tools_menu_mixins:Can't display page with uuid 15: page not found
Estimating resolution as 780
Estimating resolution as 142
Estimating resolution as 160
ERROR:tools_menu_mixins:Can't display page with uuid 16: page not found
Estimating resolution as 159
Estimating resolution as 341
Estimating resolution as 1125

*However, text recognition seems to have worked on the GUI.

When saving, the following error messages appear:*
(com.github.scantpaper:266239): Gtk-WARNING **: 13:25:07.349: Failed to 
measure available space: Fehler beim Einlesen der 
Dateisystem-Information für /home/martin/pCloudDrive: Der Socket ist 
nicht verbunden

[ I specified a completely different save location than 
“/home/martin/pCloudDrive” ]
However, the GUI shows: “2257 of 1000 being processed (save_pdf)”

*The saving process ends without completing, displaying the error message:*
ERROR:basethread:Error running process 'save_pdf':
ERROR:session_mixins:Error running 'error' callback for 'save_pdf' process:

*Comparison with gscan2pdf, v.2.13.4: When launched from the console, 
the same error as above appears:*
(net.sourceforge.gscan2pdf:274582): Gtk-WARNING **: 13:45:14.772: Failed 
to measure available space: Fehler beim Einlesen der 
Dateisystem-Information für /home/martin/pCloudDrive: Der Socket ist 
nicht verbunden

*However, the files load without any errors and much faster than with 
Scantpaper.*
ERROR - 3d3cd0a0-2aea-4dc5-9e6c-488f13c0b049, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - dada053f-10b6-486b-bd7f-bc3dbc945fcf, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - 98e70d0a-ae6a-4a00-abcf-dfae35307bf2, tesseract, Detected 71 
diacritics
ERROR - 94f1260e-afd6-4b14-97bf-65e3e6df4157, tesseract, Detected 70 
diacritics
ERROR - df423c95-d498-4365-82a9-e597dbe81548, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - 9d691361-c61b-437a-8c97-d2e008c3f40c, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - cfdff534-256c-46fd-a49c-536eaea6878f, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - 88d76958-b801-4cfb-8f42-baf9c207d7bd, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - 1ba5fc3f-92dd-444e-84db-428b58b5d1d0, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - 5ff7b239-bfdd-450c-a29e-55cf2b362a54, tesseract, Detected 386 
diacritics
ERROR - fec6ec58-0af9-434f-8e73-1070540f0bd7, tesseract, Detected 298 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - 08b05dca-9059-44f3-8930-125f2ce3d2bc, tesseract, Detected 185 
diacritics
ERROR - 7b392f03-0e1d-4323-9537-aeb7211c6d61, tesseract, Detected 345 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - 3744de25-4be6-45e8-ac00-333fa09a8a7e, tesseract, Detected 321 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - ea03244b-47cb-4180-bc76-86f0620822c5, tesseract, Detected 589 
diacritics
ERROR - e510a717-0c27-4a19-bc65-c5b5c86d3774, tesseract, Detected 279 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - ce97adc8-2d4a-4971-94fb-d7d6d6f1285e, tesseract, Detected 451 
diacritics
ERROR - 12cb3785-6e7e-45a7-bc75-e7cf7575abb0, tesseract, Detected 364 
diacritics
ERROR - f91aa7e9-5a68-46c2-8f63-0bd9541daeed, tesseract, Detected 212 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - 936951c5-99e1-42da-ac89-11d5b4815a05, tesseract, Detected 184 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - 17d7d17b-cf44-4107-bf6b-9aa29729ec88, tesseract, Detected 160 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - ef542d62-b784-4836-aca4-cb2aea909537, tesseract, Detected 196 
diacritics
ERROR - 051e2b35-9a03-4710-9143-8ba0fab843d6, tesseract, Detected 332 
diacritics
ERROR - 3e2c4220-7f7f-46ea-8ec9-dfab30ac6f2d, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - 263b918c-0a30-4bd2-8d41-a3fa157428e2, tesseract, Detected 12 
diacritics
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - 9bc817f9-45d2-4028-9b7e-1b4a3bcdc19c, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - b478c1ba-932a-4223-b4d2-d0f10ed58292, tesseract, Empty page!!
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []
ERROR - deleting corrupt text layer: []

*When saving, GScan2pdf displays the “usual error messages,” but the 
process completes successfully.*

*
*

*Regards *

*Martin*

Am 11.03.26 um 22:26 schrieb Jeff:
> Changes compared to v3.0.0-rc5
>
> * + minimal en_US translation to prevent warnings. Closes #41
> * Be graceful if the previous current working directory no longer
>   exists. Closes #42
>
> Available on Github:
>
> https://github.com/carygravel/scantpaper/releases/tag/v3.0.0
>
> Or via the PPA:
>
> https://launchpad.net/~jeffreyratcliffe/+archive/ubuntu/ppa