RE: [GD-General] Unicode
Brought to you by:
vexxed72
From: Garett B. <gt...@st...> - 2003-11-19 17:57:37
|
Paul, It was after reading Joel's article that I understood Unicode to use an indeterminate number of bytes per character. Specifically: "In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes." Which leaves me wondering, how do you figure out where one character ends and the next begins? Thanks in advance, Garett -----Original Message----- From: gam...@li... [mailto:gam...@li...]On Behalf Of Paul Reynolds Sent: Wednesday, November 19, 2003 11:31 AM To: gam...@li... Subject: RE: [GD-General] Feedback wanted on POSH This is a pretty good overview of text encoding*: http://www.joelonsoftware.com/articles/Unicode.html I'd say everyone working on a shipping game should really evaluate if raw char* strings are really a good idea. If you've ever had to localize a 7-bit ascii game, you'll know what I'm talking about. Other software industries have been embracing unicode for quite some time. * - For the record, I'm not a Joel Spolsky fanboy. I can usually take him or leave him. ;o) -----Original Message----- From: gam...@li... [mailto:gam...@li...]On Behalf Of Garett Bass Sent: Wednesday, November 19, 2003 9:13 AM To: gam...@li... Subject: RE: [GD-General] Feedback wanted on POSH // Crosbie Fitch wrote: // Hmmn maybe the chars should be like this: You will notice that POSH doesn't provide a char typedef, presumably because sizeof(char) == 1 in ANSI C, as mentioned in another post. I imagine that defining your own integer character type will require an explicit cast anytime you want to use a string manipulation function, which seems a little awkward. Of course, if you use C++ and STL, then you can always create a std::basic_string<char_utf8>, or whatever. // typedef char8 char_ascii; // Unsized char able to contain 7bit ASCII // typedef char8 char_utf8; // Unsized char able to contain... // typedef char16 char_ucs2; // Unsized char able to contain... I'm not sure I understand what you mean by "Unsized" here. If you're defining char8 to be uint8, then its size is 8 bits. // typedef char_utf8 char_unicode; // Unsized char suitable for Unicode // typedef char_unicode character; // Unsized char suitable for any text Not being too familiar with unicode, I find this confusing. I thought that "Unicode" was a multibyte format with no set number of bytes per character, ie. a single asian character may be represented by four bytes while the subsequent character is represented by two. Regards, Garett ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Gamedevlists-general mailing list Gam...@li... https://lists.sourceforge.net/lists/listinfo/gamedevlists-general Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=557 ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Gamedevlists-general mailing list Gam...@li... https://lists.sourceforge.net/lists/listinfo/gamedevlists-general Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=557 |