#7 Incorrectly extracted special characters

closed-rejected
mtraut
None
5
2010-11-06
2010-11-05
No

When extracting text using the example, all the fi and ff ligatures (along with special characters like greek letters) are not extracted correctly.

Might be connected with the fact that CSString has a null value in the encoding attribute ?

Discussion

  • Piotr Praczyk

    Piotr Praczyk - 2010-11-05

    Exampel of incorrectly extracted ligatures

     
  • mtraut

    mtraut - 2010-11-06

    The result seems "correct" to me. The font used is an embedded Type1 with an internal mapping (in the font program itself) from /014 to the ligature "fi" for example. No hint is provided using an encoding or ToUnicodeMap to what this codepoint means - there's no way to get a meaning to /014 but using /014.... Any spec conformant tool (even adobe reader) should return this.

     
  • mtraut

    mtraut - 2010-11-06
    • status: open --> closed-rejected
     
  • mtraut

    mtraut - 2010-11-06
    • assigned_to: nobody --> mtraut
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks