Menu

#88 NCR in Extraction to RTF

open
nobody
5
2008-07-15
2008-07-15
Anonymous
No

The text extraction utility, when it converts from XML to Original format + RTF Layer, it convers characters unsupported by the system default locale (windows-1250 in my case) to entities in the form of &#xnnnn;

I know that most Unicode characters can be supported in RTF through \uc1\unnnn

The translators would prefer to see the actual characters instead of entities, because these characters can be displayed in Word.

Discussion

  • Yves Savourel

    Yves Savourel - 2008-07-15

    Logged In: YES
    user_id=366561
    Originator: NO

    This issue is related to the lack of UTF-8 support in RTF extended chars system. While any Unicode char can be written \ucN\uHHHH RTF also needs the fall-back 'normal' character after, and it seems only Windows encoding are supported there. Fixing this will require some change in the RTF-escape function to handle UTF-N specific case.

     
  • Yves Savourel

    Yves Savourel - 2008-07-15
    • summary: Extraction to RTF --> NCR in Extraction to RTF