From: Shawn P. <sha...@gm...> - 2012-06-11 12:49:09
|
To have a file called std/ucstypes.e makes one expect there to be also std/ascii.e, std/crillic.e, etc... Looking at text.e and ucstypes.e we have routines for manipulating text. One file for unicode and another for everything else. Although many encodings exist for text we normally want the same routines for manipulating it whether it is one encoding or another. Even though internally utf8, utf16 and unicode are neither manipulated nor represented the same way the code pages are, it could be at an external level work the same. In text.e we have : set_encoding_properties(), suppose we allow "utf8" or "utf32" and then use the routines from ucstypes.e for subsequent calls to lower and upper. We can allow all routines public in ucstypes.e to also work with encodings other than unicode. Since the ecp data contains the Unicode equivalents this is all possible with the data provided. In text (UDT) data types can provide type safety to users ascii, utf8, utf16, utf32. We can implement proper() for Unicode with calls to title() and is_space(). One problem is with get_encoding_properties(), the sequence returned from that would be enormous without a special case provided for the double wide character encoding of unicode and doesn't really make sense for utf8 or utf16. Shawn Pringle -- PGP public key available at pool.sks-keyservers.net |