gscan2pdf / Bugs / #386 gscan2pdf needs help parsing output from opencl-enabled tesseract

#386 gscan2pdf needs help parsing output from opencl-enabled tesseract

Milestone: v1.0_(example)

Status: closed-fixed

Owner: nobody

Labels: None

Priority: 5

Updated: 2021-05-06

Created: 2021-05-04

Creator: sean dreilinger

Private: No

I've am working with gscan2pdf version 2.12.1. I recently recompiled tesseract with support for OpenCL. When OpenCL is enabled, gscan2pdf will not correctly detect the languages supported by tesseract version 4.1.1.

when i run:

tesseract --list-langs

from the command line, tesseract returns:

[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.399570
[DS] Selected Device[1]: "(null)" (Native)
List of available languages (3):
eng
enm
osd

( NB those lines beginning with [DS] are not present unless OpenCL support is enabled)

when i start gscan2pdf from the command line with --log=/path/to/logfile, the resulting log contains:

INFO - tesseract -v
INFO - Found tesseract version v4.1.1.
INFO - tesseract --list-langs
INFO - **Found tesseract language** [DS] Profile read from file (tesseract_opencl_profile_devices.dat). ([DS] Profile read from file (tesseract_opencl_profile_devices.dat).)
INFO - **Found tesseract language** [DS] Device[1] 0:(null) score is 0.399570 ([DS] Device[1] 0:(null) score is 0.399570)
INFO - **Found tesseract language** [DS] Selected Device[1]: "(null)" (Native) ([DS] Selected Device[1]: "(null)" (Native))
WARN - You are using locale 'en_US.UTF-8'. Please install tesseract package 'tesseract-ocr-eng' and restart gscan2pdf for OCR for English with tesseract.

if i choose 'OK' on the warning message dialog:

Warning: missing packages
You are using locale 'en_US.UTF-8'. Please install tesseract package 'tesseract-ocr-eng' and restart gscan2pdf for OCR for English with tesseract.

and initiate an OCR process using gscan2pdf, the OCR setup dialog offers an option to detect one of three languages:

[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.399570
[DS] Selected Device[1]: "(null)" (Native)

I made a quick workaround (attached), after which gscan2pdf recognized the languages supported by tesseract and successfully processed a 580-page document very quickly using the OpenCL-enabled tesseract.

I hope you will consider adapting gscan2pdf to support OpenCL-enabled tesseract.

1 Attachments

gscan2pdf-2.12.1.tesseract-with-opencl.diff

Discussion

Jeffrey Ratcliffe - 2021-05-06

Committed. Thanks for the patch.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeffrey Ratcliffe - 2021-05-06

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Uwe Brinkhoff - 2021-05-06

sorry - I don't see the fix.

I have the same problem - and a proposal for a solution.

The output of the command "tesseract --list-langs" with the language information goes to standard-output, the other output goes to standard-error.

In the package" Gscan2pdf::Tesseract" in the subroutine "language" the split command tests if $err has content and if positive it use it. This is the information from OpenCL. If the command instead use $out then it gets the information about the language.

So I exchange in my local copy the variables. New the line looks
@codes = split /\n/xsm, $out ? $out : $err;
and tesseract recognise the scan for me. Only an error box pops up with the content of standard-error inclusive the OpenCL output. But tesseract on the commandline delivers the same output.

Last edit: Uwe Brinkhoff 2021-05-06

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeffrey Ratcliffe - 2021-05-06

The fix is in the above attachment

But it was in essence the same as your suggestion. Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gscan2pdf needs help parsing output from opencl-enabled tesseract

Group

Searches

Help

#386 gscan2pdf needs help parsing output from opencl-enabled tesseract

Discussion