Re: [GD-General] Unicode

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

there seems to be some confusion around Unicode, that I will try to clear up.
The Unicode character set is a standard that provides different encoding
formats. 

There are two different kind of encodings, some represent the full Unicode
character set and others don't. The "loosless" formats are utf8, utf16 and
utf32. utf8 and utf16 are multibyte charcter sets, that means that a
character can be represented by multiple bytes. For example, in utf8 a
character may be take 1, 2, 3, up to 6 bytes. The nice thing about utf8 is
that it does not contain embedded zeros, so you can still use strlen, strcpy,
strdup, etc. However, in this case strlen does not provide the lenght but the
size of the string. 

utf16 usually takes a word, however some characters need two words. The
second word is usually called surrogate and is only needed by some strange
characters, usually old languages that are not used anymore. Windows NT and
Java only support a subset of unicode called UCS2, that is utf16 without the
surrogate. Windows XP on the other side is supposed to support surrogates.

Finally, the last encoding is utf32 (or ucs4) that uses a 32bits and
represents the full unicode character set.

Which representation you choose mainly depends on your application. Web
applications usually use utf8, because you can reuse the existing code and
most of the net is written using ASCII characters, so utf8 turns out to be
the most efficient. I currently use ucs2 internally in my applications, and
that's probably what most games will need.

This is an oversimplification, so check this out for more info:
http://www.unicode.org/faq/

Hope that helps,

--
Ignacio Castaño
cas...@ya...

___________________________________________________
Yahoo! Messenger - Nueva versión GRATIS
Super Webcam, voz, caritas animadas, y más...
http://messenger.yahoo.es