RE: [GD-General] Unicode
Brought to you by:
vexxed72
From: Nicolas R. <nic...@fr...> - 2003-11-19 18:58:47
|
Hmmmm, Looks like you misunderstood something... There are three ways of storing strings: - SBCS: "Single Byte Character Sets", using only 8-bits character encoding. That's the easiest one... Note that many kind of SBCS are available and they are only compatible on the 0-127 part. - DBCS: "Double Byte Character Sets", using 16-bits character encoding. UNICODE is one of those... - MBCS: "Multi Byte Character Sets", using a variable number of characters depending on the first one. That's exactly the kind of things that drives me nuts: inventing a stupid thing for badly engineered older things to continue working. But hey, that's life. There, you cannot tell the size of a character, however, the system is providing you with functions for that. Basically you are ALWAYS pointing to the first byte of the character (otherwise everything is broken). Given that byte, you can tell the size of the character (mbclen or something like that), incrementing the pointer will then give you the next character. Last char is 0. Note that it is IMPOSSIBLE to go backward unless you know the string first character address. Note also that it is the way Windows is doing the UI/File-system. So basically: // length (number of characters) of a string: unsigned int _strlen(const char* mbstr) { unsigned int ret = 0; while (*mbstr) { ++ret; mbstr += mbclen(mbstr); } return ret; } // size (in bytes) of a string (not including ending null char): unsigned int _strsize(const char* mbstr) { unsigned int ret = 0; unsigned int t; while (*mbstr) { t = mbclen(mbstr); ret += t; mbstr += t; } return ret; } > -----Original Message----- > From: gam...@li... > [mailto:gam...@li...] On > Behalf Of Garett Bass > Sent: Wednesday, November 19, 2003 6:58 PM > To: gam...@li... > Subject: RE: [GD-General] Unicode > > > Paul, > > It was after reading Joel's article that I understood > Unicode to use an indeterminate number of bytes per > character. Specifically: > > "In UTF-8, every code point from 0-127 is stored in a single > byte. Only code points 128 and above are stored using 2, 3, > in fact, up to 6 bytes." > > Which leaves me wondering, how do you figure out where one > character ends and the next begins? > > Thanks in advance, > Garett > > > -----Original Message----- > From: gam...@li... > [mailto:gam...@li...]On > Behalf Of Paul Reynolds > Sent: Wednesday, November 19, 2003 11:31 AM > To: gam...@li... > Subject: RE: [GD-General] Feedback wanted on POSH > > > This is a pretty good overview of text encoding*: > http://www.joelonsoftware.com/articles/Unicode.html > > I'd say everyone working on a shipping game should really > evaluate if raw > char* strings are really a good idea. If you've ever had to > localize a 7-bit ascii game, you'll know what I'm talking > about. Other software industries have been embracing unicode > for quite some time. > > * - For the record, I'm not a Joel Spolsky fanboy. I can > usually take him or leave him. ;o) > > -----Original Message----- > From: gam...@li... > [mailto:gam...@li...]On > Behalf Of Garett Bass > Sent: Wednesday, November 19, 2003 9:13 AM > To: gam...@li... > Subject: RE: [GD-General] Feedback wanted on POSH > > > // Crosbie Fitch wrote: > // Hmmn maybe the chars should be like this: > > You will notice that POSH doesn't provide a char typedef, > presumably because > sizeof(char) == 1 in ANSI C, as mentioned in another post. I > imagine that defining your own integer character type will > require an explicit cast anytime you want to use a string > manipulation function, which seems a little awkward. Of > course, if you use C++ and STL, then you can always create a > std::basic_string<char_utf8>, or whatever. > > // typedef char8 char_ascii; // Unsized char able to contain > 7bit ASCII // typedef char8 char_utf8; // Unsized char able > to contain... // typedef char16 char_ucs2; // Unsized char > able to contain... > > I'm not sure I understand what you mean by "Unsized" here. > If you're defining char8 to be uint8, then its size is 8 bits. > > // typedef char_utf8 char_unicode; // Unsized char suitable > for Unicode // typedef char_unicode character; // Unsized > char suitable for any text > > Not being too familiar with unicode, I find this confusing. > I thought that "Unicode" was a multibyte format with no set > number of bytes per character, ie. a single asian character > may be represented by four bytes while the subsequent > character is represented by two. > > Regards, > Garett > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > Does SourceForge.net help you be more productive? Does it > help you create better code? SHARE THE LOVE, and help us > help YOU! Click Here: http://sourceforge.net/donate/ > _______________________________________________ > Gamedevlists-general mailing list > Gam...@li... > https://lists.sourceforge.net/lists/listinfo/gamedevlists-general > Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=557 > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > Does SourceForge.net help you be more productive? Does it > help you create better code? SHARE THE LOVE, and help us > help YOU! Click Here: http://sourceforge.net/donate/ > _______________________________________________ > Gamedevlists-general mailing list > Gam...@li... > https://lists.sourceforge.net/lists/listinfo/gamedevlists-general > Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=557 > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > Does SourceForge.net help you be more productive? Does it > help you create better code? SHARE THE LOVE, and help us > help YOU! Click Here: http://sourceforge.net/donate/ > _______________________________________________ > Gamedevlists-general mailing list > Gam...@li... > https://lists.sourceforge.net/lists/listinfo/gamedevlists-general > Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=557 > |