Version : 1.2.5
Open a PDF file
Ask for an OCR with tesseract.
The page is processed, but the "OCR tab" remains void.
In log, I get the corersponding lines (I replaced the source file name with <myfile.pdf>) :</myfile.pdf>
INFO - 1 pages
INFO - pdfimages -f 1 -l 1 "<myfile.pdf>" x
INFO - New page filename x-000.ppm, format Portable pixmap format (color)
INFO - New page filename /tmp/gscan2pdf-Lq1D/g9C4aGQW6w.png, format Portable Network Graphics
INFO - Added /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png at page 1 with resolution 199.950168350168
DEBUG - Started setting page_number_start from 1 to 2
DEBUG - Finished setting page_number_start from 1 to 2
INFO - Found tesseract version 3.02.02.
INFO - echo tessedit_create_hocr 1 > hocr.config;tesseract /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png /tmp/2hXGRZlhDI -l fra +hocr.config;rm hocr.config
DEBUG - Warnings from Tesseract: Tesseract Open Source OCR Engine v3.02.02 with Leptonica</myfile.pdf>
INFO - Replaced /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png at page 1 with /tmp/gscan2pdf-Lq1D/wFB4NSdVrc.png, resolution 199.950168350168
When I launch
echo tessedit_create_hocr 1 > hocr.config;tesseract /tmp/gscan2pdf-Lq1D/wFB4NSdVrc.png /tmp/2hXGRZlhDI -l fra +hocr.config
I get /tmp/2hXGRZlhDI.html file with the good content.
The place seems not good, perhaps related to:
https://sourceforge.net/p/gscan2pdf/bugs/202/
Is the .html and the end expected?
Papoteur
Still valid in 1.5.2
Apologies for the late response. Tesseract works with gscan2pdf for me. Can you post an example PDF that reproduces the problem?
Hello,
This is no more valid in Mageia 6/cauldron with 1.7.2 release.
Thus, we can close, although Mageia 5 is still a maintained release.