From: Peter J. <pj...@wa...> - 2004-05-06 09:45:34
|
Hi Pierre, > > It's 3 bytes per character. > > ...because the internal coding is UTF-8, not UTF-16, so yes, basically, most > of the simple Unicode characters map to at most 3 bytes using UTF-8. Some > more exotic characters won't make it, however. In fact, UNICODE_FSS is more like CESU-8 and the the non-BMP characters will be represented as two "characters" (low and hi surrogate) each taking 3 bytes, for a total of 6 bytes. Regards, Peter Jacobi |