Re: [gscan2pdf-help] Unicode in PDF
Brought to you by:
ra28145
From: Jeffrey R. <jef...@gm...> - 2010-08-29 21:47:32
|
On 29 August 2010 21:11, John Fingerhut <and...@gm...> wrote: > Did my earlier attempt at sending an email get through, with an attached > slightly modified version of your Perl script, with a few Greek characters > added to the string, and my comments about how the text is visible, but not > searchable or pdftotext-able? To be honest, I didn't try it, because I had already done something similar myself, with identical results. The only additional information I gleaned was that evince (or more probably poppler) complains on the command line that the PDFs are corrupt. > Are you thinking of trying to fix whatever limitations exist in PDF::API2 > that make the text unable to be searched? Without that capability, there > isn't much point in using that method in gscan2pdf. Given that the Unicode text is displayed correctly, I am hoping that it won't require too much work to patch PDF::API2 to create valid PDF that pdftotext can read. Note that is doesn't seem to be a problem with Unicode itself, but with the handling of the DejaVu font (or maybe all TTF). When I tried standard ASCII in the same manner, I also got a corrupt PDF. Regards Jeff |