When extracting text using the example, all the fi and ff ligatures (along with special characters like greek letters) are not extracted correctly.
Might be connected with the fact that CSString has a null value in the encoding attribute ?
Exampel of incorrectly extracted ligatures
The result seems "correct" to me. The font used is an embedded Type1 with an internal mapping (in the font program itself) from /014 to the ligature "fi" for example. No hint is provided using an encoding or ToUnicodeMap to what this codepoint means - there's no way to get a meaning to /014 but using /014.... Any spec conformant tool (even adobe reader) should return this.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.