Re: [GD-General] Unicode
Brought to you by:
vexxed72
From: <cas...@ya...> - 2003-11-20 23:23:42
|
Hi, there seems to be some confusion around Unicode, that I will try to clear up. The Unicode character set is a standard that provides different encoding formats. There are two different kind of encodings, some represent the full Unicode character set and others don't. The "loosless" formats are utf8, utf16 and utf32. utf8 and utf16 are multibyte charcter sets, that means that a character can be represented by multiple bytes. For example, in utf8 a character may be take 1, 2, 3, up to 6 bytes. The nice thing about utf8 is that it does not contain embedded zeros, so you can still use strlen, strcpy, strdup, etc. However, in this case strlen does not provide the lenght but the size of the string. utf16 usually takes a word, however some characters need two words. The second word is usually called surrogate and is only needed by some strange characters, usually old languages that are not used anymore. Windows NT and Java only support a subset of unicode called UCS2, that is utf16 without the surrogate. Windows XP on the other side is supposed to support surrogates. Finally, the last encoding is utf32 (or ucs4) that uses a 32bits and represents the full unicode character set. Which representation you choose mainly depends on your application. Web applications usually use utf8, because you can reuse the existing code and most of the net is written using ASCII characters, so utf8 turns out to be the most efficient. I currently use ucs2 internally in my applications, and that's probably what most games will need. This is an oversimplification, so check this out for more info: http://www.unicode.org/faq/ Hope that helps, -- Ignacio Castaño cas...@ya... ___________________________________________________ Yahoo! Messenger - Nueva versión GRATIS Super Webcam, voz, caritas animadas, y más... http://messenger.yahoo.es |