#7 Incorrectly extracted special characters

closed-rejected
mtraut
None
5
2010-11-06
2010-11-05
Piotr Praczyk
No

When extracting text using the example, all the fi and ff ligatures (along with special characters like greek letters) are not extracted correctly.

Might be connected with the fact that CSString has a null value in the encoding attribute ?

Discussion

  • Piotr Praczyk
    Piotr Praczyk
    2010-11-05

    Exampel of incorrectly extracted ligatures

     
    Attachments
  • mtraut
    mtraut
    2010-11-06

    The result seems "correct" to me. The font used is an embedded Type1 with an internal mapping (in the font program itself) from /014 to the ligature "fi" for example. No hint is provided using an encoding or ToUnicodeMap to what this codepoint means - there's no way to get a meaning to /014 but using /014.... Any spec conformant tool (even adobe reader) should return this.

     
  • mtraut
    mtraut
    2010-11-06

    • status: open --> closed-rejected
     
  • mtraut
    mtraut
    2010-11-06

    • assigned_to: nobody --> mtraut