No text visible when selecting text in Evince
Brought to you by:
tobias-elze
When running pdfsandwich on a PDF file, everything seems to work well, except for hundreds of warnings like
GPL Ghostscript 9.10: Missing glyph CID=0, glyph=0077 in the font ZDWFVB+GlyphLessFont . The output PDF may fail with some viewers.
However, when I open the resulting file in Evince, and I select some text, I only see the green box showing my selection; there is no text visible inside this green box (see attachment). Usually, I would expect to see the selected text.
I'm not sure if this is a problem of pdfsandwich at all ...
Hi Andreas,
I'm not sure what happens there. Is the text properly extracted? And does this happen in all pdf viewers or only in evince?
Text is properly extracted: When I hit C-c to copy to clipboard, and
then insert in some other document, the text is there.
With acroread (under Linux) and SumatraPDF (Windows version under Linux,
using Wine), I can see the text. BUT, both programs put a transparent
overlay over the PDF image, so the original (from the image) text is
visible. Evince works differently; there, the selected text is not
visible through the solid-color selection box, but rather rendered in a
different font (usually).
So I guess strictly speaking this is not a problem of pdfsandwich; but
maybe it would be nice to have an option to have pdfsandwich produce
output which is compatible with evince.
Or maybe it's problem in the GTK themes I have installed on my machine;
I'm using Linuxmint 17 with Cinnamon 2.4.6.
I see. This really does not sound like a pdfsandwich bug. I'm afraid I can't help here. You might want to use another pdf viewer? There are numerous other options under Linux apart from evince.
Tobias
I agree it definitely isn't a bug in pdfsandwich.
I've come across some PDFs consisting of scanned and OCRed book, and
there Evince showed selected text in some system font on top of the
opaque selection marker. I thought that pdfsandwich could somehow
support this. But I'm not even sure if it should.
I installed okular on my machine, and with it, it works nicely
(transparent selection marker, original scanned image visible under
selection).
Thanks!
I'm having this problem as well. I just tried reporting it as a bug in ghostscript, since that's where the error seems to be coming from.
http://bugs.ghostscript.com/show_bug.cgi?id=695869
I guess Ghostscript is actually doing what it's supposed to here. I have also noticed that any PDF I make with tesseract has the same problem, both in evince and in PDFViewer for Emacs. So I reported this as a bug in tesseract here: https://code.google.com/p/tesseract-ocr/issues/detail?id=1434
I'm having more problems with Pdfsandwich's output which are apparently due to this font problem in Tesseract (https://github.com/tabulapdf/tabula/issues/309). I tried
pdfsandwich -enforcehocr2pdfas a workaround, and it solves the font issue but the text it gives is pretty garbled.I guess there is currently no free OCR engine that gives decent results without broken fonts?
It's a design feature of Tesseract.
https://github.com/tesseract-ocr/tesseract/issues/1769
As most PDF viewers display transparently bitmap data under the GlyphLessFont -> the solution would be rather to fix the bug in Evince.