Re: [gscan2pdf-help] scantpaper v3.0.0 released
Brought to you by:
ra28145
|
From: Martin H. <her...@po...> - 2026-03-19 13:58:21
|
*Hello Jeff, hello All,* *I’ve been a big fan of gscan2pdf for many years! Thank you so much, Jeff! For about three years now, I’ve been using gscan2pdf for historical research, particularly when working with file formats created in archives. gscan2pdf is a great tool for historians! For example, the German Federal Archives makes digitized documents available for download on the web in JPG format. For some archival materials, several hundred pages need to be processed; for others (more rarely), over a thousand pages. I import these files into Gscan2pdf, use Tesseract to add an OCR layer, and save them in PDF format. In my experience, Gscan2pdf has trouble processing more than 300 pages. That’s why I “split” the jobs into batches of 250 to 300 pages—depending on the quality of the original—and then “merge” the individual PDF files using a PDF editor. A second example is data sets that you create yourself on-site at a state archive using a microfilm reader. A third example is scan collections created on-site in the archives using smartphone apps. GScan2pdf works well for all these purposes. As of today, I’ve been testing Scantpaper v3.0.0 on a desktop running the Debian 13 operating system. Scantpaper was installed using a pre-built deb package. The first thing I noticed was an error message when launching Scantpaper from the console: "Error retrieving scanner options: expected string or bytes-like object, got ‘NoneType’" Subjectively, Scantpaper seems to me to work more slowly than GScan2pdf when loading files and performing text recognition. When loading JPG files, I receive the following error messages for each file or page (excerpt for illustrative purposes):* "WARNING:py.warnings:/usr/lib/python3/dist-packages/gi/_propertyhelper.py:220: Warning: value "137" of type 'gint' is invalid or out of range for property 'page-number-start' of type 'gint' instance.set_property(self.name, value) WARNING:py.warnings:/usr/lib/python3/dist-packages/gi/_propertyhelper.py:220: Warning: value "138" of type 'gint' is invalid or out of range for property 'page-number-start' of type 'gint' instance.set_property(self.name, value) WARNING:py.warnings:/usr/lib/python3/dist-packages/gi/_propertyhelper.py:220: Warning: value "139" of type 'gint' is invalid or out of range for property 'page-number-start' of type 'gint' instance.set_property(self.name, value)" *During the text recognition step with Tesseract, error messages are also “normal” for GScan2pdf, since scanned documents from archives are often of very poor quality. When using Scantpaper, I receive the following error messages (excerpt for illustrative purposes):* Estimating resolution as 765 ERROR:tools_menu_mixins:Can't display page with uuid 13: page not found Estimating resolution as 110 Estimating resolution as 773 ERROR:tools_menu_mixins:Can't display page with uuid 14: page not found Estimating resolution as 132 Estimating resolution as 774 Estimating resolution as 136 ERROR:tools_menu_mixins:Can't display page with uuid 15: page not found Estimating resolution as 780 Estimating resolution as 142 Estimating resolution as 160 ERROR:tools_menu_mixins:Can't display page with uuid 16: page not found Estimating resolution as 159 Estimating resolution as 341 Estimating resolution as 1125 *However, text recognition seems to have worked on the GUI. When saving, the following error messages appear:* (com.github.scantpaper:266239): Gtk-WARNING **: 13:25:07.349: Failed to measure available space: Fehler beim Einlesen der Dateisystem-Information für /home/martin/pCloudDrive: Der Socket ist nicht verbunden [ I specified a completely different save location than “/home/martin/pCloudDrive” ] However, the GUI shows: “2257 of 1000 being processed (save_pdf)” *The saving process ends without completing, displaying the error message:* ERROR:basethread:Error running process 'save_pdf': ERROR:session_mixins:Error running 'error' callback for 'save_pdf' process: *Comparison with gscan2pdf, v.2.13.4: When launched from the console, the same error as above appears:* (net.sourceforge.gscan2pdf:274582): Gtk-WARNING **: 13:45:14.772: Failed to measure available space: Fehler beim Einlesen der Dateisystem-Information für /home/martin/pCloudDrive: Der Socket ist nicht verbunden *However, the files load without any errors and much faster than with Scantpaper.* ERROR - 3d3cd0a0-2aea-4dc5-9e6c-488f13c0b049, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - dada053f-10b6-486b-bd7f-bc3dbc945fcf, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - 98e70d0a-ae6a-4a00-abcf-dfae35307bf2, tesseract, Detected 71 diacritics ERROR - 94f1260e-afd6-4b14-97bf-65e3e6df4157, tesseract, Detected 70 diacritics ERROR - df423c95-d498-4365-82a9-e597dbe81548, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - 9d691361-c61b-437a-8c97-d2e008c3f40c, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - cfdff534-256c-46fd-a49c-536eaea6878f, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - 88d76958-b801-4cfb-8f42-baf9c207d7bd, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - 1ba5fc3f-92dd-444e-84db-428b58b5d1d0, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - 5ff7b239-bfdd-450c-a29e-55cf2b362a54, tesseract, Detected 386 diacritics ERROR - fec6ec58-0af9-434f-8e73-1070540f0bd7, tesseract, Detected 298 diacritics ERROR - deleting corrupt text layer: [] ERROR - 08b05dca-9059-44f3-8930-125f2ce3d2bc, tesseract, Detected 185 diacritics ERROR - 7b392f03-0e1d-4323-9537-aeb7211c6d61, tesseract, Detected 345 diacritics ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - 3744de25-4be6-45e8-ac00-333fa09a8a7e, tesseract, Detected 321 diacritics ERROR - deleting corrupt text layer: [] ERROR - ea03244b-47cb-4180-bc76-86f0620822c5, tesseract, Detected 589 diacritics ERROR - e510a717-0c27-4a19-bc65-c5b5c86d3774, tesseract, Detected 279 diacritics ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - ce97adc8-2d4a-4971-94fb-d7d6d6f1285e, tesseract, Detected 451 diacritics ERROR - 12cb3785-6e7e-45a7-bc75-e7cf7575abb0, tesseract, Detected 364 diacritics ERROR - f91aa7e9-5a68-46c2-8f63-0bd9541daeed, tesseract, Detected 212 diacritics ERROR - deleting corrupt text layer: [] ERROR - 936951c5-99e1-42da-ac89-11d5b4815a05, tesseract, Detected 184 diacritics ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - 17d7d17b-cf44-4107-bf6b-9aa29729ec88, tesseract, Detected 160 diacritics ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - ef542d62-b784-4836-aca4-cb2aea909537, tesseract, Detected 196 diacritics ERROR - 051e2b35-9a03-4710-9143-8ba0fab843d6, tesseract, Detected 332 diacritics ERROR - 3e2c4220-7f7f-46ea-8ec9-dfab30ac6f2d, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - 263b918c-0a30-4bd2-8d41-a3fa157428e2, tesseract, Detected 12 diacritics ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - 9bc817f9-45d2-4028-9b7e-1b4a3bcdc19c, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - b478c1ba-932a-4223-b4d2-d0f10ed58292, tesseract, Empty page!! ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] ERROR - deleting corrupt text layer: [] *When saving, GScan2pdf displays the “usual error messages,” but the process completes successfully.* * * *Regards * *Martin* Am 11.03.26 um 22:26 schrieb Jeff: > Changes compared to v3.0.0-rc5 > > * + minimal en_US translation to prevent warnings. Closes #41 > * Be graceful if the previous current working directory no longer > exists. Closes #42 > > Available on Github: > > https://github.com/carygravel/scantpaper/releases/tag/v3.0.0 > > Or via the PPA: > > https://launchpad.net/~jeffreyratcliffe/+archive/ubuntu/ppa |