Re: [Cppunit-devel] Toward True Unicode Code... Help requested
Brought to you by:
blep
From: Duane M. <dua...@ma...> - 2002-04-15 23:37:18
|
--- At Mon, 15 Apr 2002 23:32:29 +0200, Baptiste Lepilleur wrote: >----- Original Message ----- >From: "Duane Murphy" <dua...@ma...> >To: "CppUnit Developers" <cpp...@li...> >Sent: Sunday, April 14, 2002 7:35 PM >Subject: Re: [Cppunit-devel] Toward True Unicode Code... Help requested > > >> My two cents for what its worth is to do nothing. >> >> There is no "True" Unicode. There are several unicode formats. UTF-8, >> UTF-16, and the upcoming UTF-32 among others. I have been told by some >> associates that keep up with such things that UTF-16, while currently >> being used, is on its way out in favor of UTF-32. It is most often >> recommended to use the simple UTF-8 for most applications. >> >> UTF-8 will likely satisfy most of us and require absolutlely no changes. >> UTF-8 is completely compatible with ASCII (for characters < 128). UTF-8 >> fits nicely in a standard string. Anyone that is concerned about such >> things has already worked around any problems involved in using UTF-8 >> with std::string. This mostly involved parsing and locating character >> seperations which is of little concern to CppUnit. > >Just a question on the side, does that means that if you split a string into >many lines using the '\n' character, you can use the same algorithm in ANSI >and UTF8 ? (=> even two or three bytes characters encoding don't use '\n') I hope I understand the question. If I output a string that includes a '\n' in a stream, and some other process is parsing that stream, will '\n' be unique? The answer is yes! I was equally stunned to here this. Once a shift character is seen that identifies that following characters as unicode, then none of the bytes that are part of that unicode "character" will be less than 128! This is what makes UTF-8 work. All characters <128 are always ASCII! >> Another reason to do nothing is that I would hope that the C++ standards >> committee at least makes some statement about Unicode or >> internationalization. They have done lots of work to put in >> infrastructure that very few people really understand. I believe that >> they need to make some statement or show some examples of how to truly >> deal with Unicode. >> >> My recommendation is to do nothing. >> >> Is there some other driving factor behind this decision? > >My original though was that it makes it easier for outputter: AFAIK you can >not set a code page saying that you're working in UTF8 (let me know if it is >possible). I'm not sure where you want to specify a code page and I'm not always clear as to what a code page means in some contexts. I think (and this is very old memory) that the encoding of an XML file can be UTF-8. Beyond that, I dont know. >Since you have API such as fwprintf, cwerr... it wouldn't be a problem to >display the output in Unicode. So I did some testing: trying to display a >few hiragana in VC++ output window. I try two differents way: >- running the test application in post-build test, and printing with >fwprintf >- from a VC++ add-ins, using IApplication::PrintToOutputWindow, which take a >unicode string as argument. > >Same result for both, a few '?' characters, meaning that a conversion >occured from unicode to multi-byte charater, and failed to find a match for >the unicode character (the font used for the output window support those >unicode characters). > >Basically, that means using unicode doesn't make anything easier: even if >you have unicode, you need to write special application to display the >result. The same applies to UTF8, but... > >For UTF8, we already have the XmlOuputter (thanks to Fumiki suggestions, we >can now specify the encoding). > >So I agree, let's not change CppUnit. It already support UTF8 and that's >enough. If anything need to be changed, it would be the GUI TestRunner to >support UTF8 and font selection. ...Duane -- "If tyranny and oppression come to this land, it will be in the guise of fighting a foreign enemy." - James Madison |