Menu

#229 Tesseract's recognized text is not recovered

v1.0_(example)
closed-fixed
nobody
None
5
2017-04-24
2016-08-28
papoteur
No

Version : 1.2.5

Open a PDF file
Ask for an OCR with tesseract.
The page is processed, but the "OCR tab" remains void.
In log, I get the corersponding lines (I replaced the source file name with <myfile.pdf>) :</myfile.pdf>

INFO - 1 pages
INFO - pdfimages -f 1 -l 1 "<myfile.pdf>" x
INFO - New page filename x-000.ppm, format Portable pixmap format (color)
INFO - New page filename /tmp/gscan2pdf-Lq1D/g9C4aGQW6w.png, format Portable Network Graphics
INFO - Added /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png at page 1 with resolution 199.950168350168
DEBUG - Started setting page_number_start from 1 to 2
DEBUG - Finished setting page_number_start from 1 to 2
INFO - Found tesseract version 3.02.02.
INFO - echo tessedit_create_hocr 1 > hocr.config;tesseract /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png /tmp/2hXGRZlhDI -l fra +hocr.config;rm hocr.config
DEBUG - Warnings from Tesseract: Tesseract Open Source OCR Engine v3.02.02 with Leptonica</myfile.pdf>

INFO - Replaced /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png at page 1 with /tmp/gscan2pdf-Lq1D/wFB4NSdVrc.png, resolution 199.950168350168

When I launch
echo tessedit_create_hocr 1 > hocr.config;tesseract /tmp/gscan2pdf-Lq1D/wFB4NSdVrc.png /tmp/2hXGRZlhDI -l fra +hocr.config
I get /tmp/2hXGRZlhDI.html file with the good content.

The place seems not good, perhaps related to:
https://sourceforge.net/p/gscan2pdf/bugs/202/
Is the .html and the end expected?

Papoteur

Discussion

  • papoteur

    papoteur - 2016-08-28

    Still valid in 1.5.2

     
  • Jeffrey Ratcliffe

    Apologies for the late response. Tesseract works with gscan2pdf for me. Can you post an example PDF that reproduces the problem?

     
  • papoteur

    papoteur - 2017-04-21

    Hello,
    This is no more valid in Mageia 6/cauldron with 1.7.2 release.
    Thus, we can close, although Mageia 5 is still a maintained release.

     
  • Jeffrey Ratcliffe

    • status: open --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB