Fonts with difference encoding can have ToUnicode, as well
A PDF parsing, modification and creation library.
Brought to you by:
domseichter
PdfFifferenceEncoding needs to support explicit ToUnicode tables, too. PDF examples requiring this can be created, for example, at https://www.canva.com/design/play?category=tACFat6uXco
I'm sorry, but I hate to open random sites whit their cookies consents and whatever. Would it be possible to attach such file and claim what precisely is your patch fixing, please? Like: without it, PoDoFo cannot.... , but with it PoDoFo can.....
Use
NULLinstead, please, the same as the other code in the PoDoFo.The m_toUnicode can be empty (you may check m_toUnicode.empty()).
When there are both differences and to Unicode, then the later overwrites the value of the former. Can it happen? Might there be an
elseclause?The attached file should extract text like “Wear proper.” Without the patch, I am getting random-looking character substitutions, text like “Wlaeba cr lpot.”
GetUnicodeValue should be able to handle requests for glyhs not in m_toUnicode, which includes it being empty.
When there are both differences and toUnicode, I expect toUnicode to take precedence, yes.
Thanks for the file. I checked the PDF ISO and according to the "9.10.2 Mapping Character Codes to Unicode Values" the toUnicode has a precedence over the differences. Your patch does both, but I think in a good way.
I committed your patch (slightly modified) as [r2044].
Related
Commit: [r2044]