Menu

#6 No text visible when selecting text in Evince

v1.0 (example)
wont-fix
nobody
None
5
2019-11-27
2015-02-17
Andreas H.
No

When running pdfsandwich on a PDF file, everything seems to work well, except for hundreds of warnings like

GPL Ghostscript 9.10: Missing glyph CID=0, glyph=0077 in the font ZDWFVB+GlyphLessFont . The output PDF may fail with some viewers.

However, when I open the resulting file in Evince, and I select some text, I only see the green box showing my selection; there is no text visible inside this green box (see attachment). Usually, I would expect to see the selected text.

I'm not sure if this is a problem of pdfsandwich at all ...

1 Attachments

Discussion

  • Tobias Elze

    Tobias Elze - 2015-02-17

    Hi Andreas,

    I'm not sure what happens there. Is the text properly extracted? And does this happen in all pdf viewers or only in evince?

     
    • Andreas H.

      Andreas H. - 2015-02-17

      Text is properly extracted: When I hit C-c to copy to clipboard, and
      then insert in some other document, the text is there.

      With acroread (under Linux) and SumatraPDF (Windows version under Linux,
      using Wine), I can see the text. BUT, both programs put a transparent
      overlay over the PDF image, so the original (from the image) text is
      visible. Evince works differently; there, the selected text is not
      visible through the solid-color selection box, but rather rendered in a
      different font (usually).

      So I guess strictly speaking this is not a problem of pdfsandwich; but
      maybe it would be nice to have an option to have pdfsandwich produce
      output which is compatible with evince.

      Or maybe it's problem in the GTK themes I have installed on my machine;
      I'm using Linuxmint 17 with Cinnamon 2.4.6.

       
  • Tobias Elze

    Tobias Elze - 2015-02-18

    I see. This really does not sound like a pdfsandwich bug. I'm afraid I can't help here. You might want to use another pdf viewer? There are numerous other options under Linux apart from evince.

    Tobias

     
    • Andreas H.

      Andreas H. - 2015-02-18

      I agree it definitely isn't a bug in pdfsandwich.

      I've come across some PDFs consisting of scanned and OCRed book, and
      there Evince showed selected text in some system font on top of the
      opaque selection marker. I thought that pdfsandwich could somehow
      support this. But I'm not even sure if it should.

      I installed okular on my machine, and with it, it works nicely
      (transparent selection marker, original scanned image visible under
      selection).

      Thanks!

       
  • Dowcet

    Dowcet - 2015-03-15

    I'm having this problem as well. I just tried reporting it as a bug in ghostscript, since that's where the error seems to be coming from.

    http://bugs.ghostscript.com/show_bug.cgi?id=695869

     
  • Dowcet

    Dowcet - 2015-03-16

    I guess Ghostscript is actually doing what it's supposed to here. I have also noticed that any PDF I make with tesseract has the same problem, both in evince and in PDFViewer for Emacs. So I reported this as a bug in tesseract here: https://code.google.com/p/tesseract-ocr/issues/detail?id=1434

     
  • Dowcet

    Dowcet - 2015-04-24

    I'm having more problems with Pdfsandwich's output which are apparently due to this font problem in Tesseract (https://github.com/tabulapdf/tabula/issues/309). I tried pdfsandwich -enforcehocr2pdf as a workaround, and it solves the font issue but the text it gives is pretty garbled.

    I guess there is currently no free OCR engine that gives decent results without broken fonts?

     
  • Tobias Elze

    Tobias Elze - 2015-07-09
    • status: open --> wont-fix
     
  • Marek Kotas

    Marek Kotas - 2019-11-27

    It's a design feature of Tesseract.
    https://github.com/tesseract-ocr/tesseract/issues/1769

    As most PDF viewers display transparently bitmap data under the GlyphLessFont -> the solution would be rather to fix the bug in Evince.

     

Log in to post a comment.