From: Markus S. <mar...@jt...> - 2003-07-30 00:08:31
|
ICU converters do not perform any normalization, only the unorm_* functions (and Normalizer class) do. If you convert any Unicode text between any Unicode charsets (UTF-8/16/32/7, CESU-8, SCSU, BOCU-1, ...), you will get the very same text just in a different form. (Unicode text consisting of code points that are legal in all encoding forms/schemes, i.e., 0000..d7ff and e000..10ffff.) markus Ostermueller, Erik wrote: > Using the ICU converter API, if I do a round-trip conversion from UTF-8, to 16 > and then back to 8, are there any guarantees whether composed characters > will stay composed? Will decomposed characters stay decomposed? Yes. > My overall question is this: How likely is it that the input > length (original UTF-8 data) will equal the output length > (length of the data converted from 16 to 8)? 100%, given legal UTF-8 data. Conversion between legal UTF-8 and other forms of Unicode is unambiguous. |