#111 entity class brings mis-translated characters

3.64
closed-out-of-date
Mocchi
9
2013-04-03
2012-02-15
Mocchi
No

The implementation of entity class is based on entity::$_entities['cp1251']. The key is
named cp1251 (Windows-1251) but the translation is actually for
Windows-1252. In the map, the code point 'x80' is mapped on the code
point x20AC (Euro sign in UCS-2). This mapping is for Windows-1252, not
Windows-1251.

In detail, please refer to this URLs.

Windows 1251
http://www.iana.org/assignments/charset-reg/windows-1251
http://msdn.microsoft.com/en-us/goglobal/cc305144

Windows 1252
http://www.iana.org/assignments/charset-reg/windows-1252
http://msdn.microsoft.com/ja-jp/goglobal/cc305145.aspx

In core scripts, entity class is referred by stringToAttribute() and
stringToXML() in globalfunctions.php. These two functions are referred
by ACTIONS, ITEMACTIONS and COMMENTACTIONS class. The purpose is convert
entities for output.

I read W3C reccomendations for HTML 4.01/XML fifth edition/XHTML 1.0 and
realized the differences are:
1. XML/XHTML cannot include decimal numeric character but HTML can both
decimal/hexadecimal.
2. For 'apostrophe', use '/' instead of ' in HTML but
XML/XHTML can use '/'

I think we can use i18n::hen() and i18n::hsc() with a bit modification
instead of stringToAttribute() and stringToXML(). Then these two
functions and entity class are not used and can be deprecated.

Discussion

  • Mocchi

    Mocchi - 2012-02-15
    • summary: entity class cause mis-translationg characters --> entity class brings mis-translated characters
     
  • Mocchi

    Mocchi - 2013-04-03

    This is closed because 3.6 branch was ended by releasing 3.65. This request is passed to 4.0 release.

     
  • Mocchi

    Mocchi - 2013-04-03
    • status: open --> closed-out-of-date
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks