Re: [Cppunit-devel] Toward True Unicode Code... Help requested

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

--- At Tue, 16 Apr 2002 20:28:56 +0200, Baptiste Lepilleur wrote:

>----- Original Message -----
>From: "Duane Murphy" <dua...@ma...>
>To: "Baptiste Lepilleur" <gai...@fr...>; "CppUnit Developers"
><cpp...@li...>
>Sent: Tuesday, April 16, 2002 5:28 PM
>Subject: Re: [Cppunit-devel] Toward True Unicode Code... Help requested
>
>
>> --- At Tue, 16 Apr 2002 13:51:54 +0200, Baptiste Lepilleur wrote:
>> >----- Original Message -----
>> >From: "Duane Murphy" <dua...@ma...>
>> >To: "Baptiste Lepilleur" <gai...@fr...>; "CppUnit Developers"
>> ><cpp...@li...>
>> >Sent: Tuesday, April 16, 2002 1:37 AM
>> >Subject: Re: [Cppunit-devel] Toward True Unicode Code... Help requested
>> >
>> >
>> >> --- At Mon, 15 Apr 2002 23:32:29 +0200, Baptiste Lepilleur wrote:
>> >>
>> >> >----- Original Message -----
>> >> >From: "Duane Murphy" <dua...@ma...>
>> >> >To: "CppUnit Developers" <cpp...@li...>
>> >> >Sent: Sunday, April 14, 2002 7:35 PM
>> >> >Subject: Re: [Cppunit-devel] Toward True Unicode Code... Help
>requested
>> >> >
>> >[...]
>> >> I hope I understand the question. If I output a string that includes a
>> >> '\n' in a stream, and some other process is parsing that stream, will
>> >> '\n' be unique?
>> >>
>> >> The answer is yes! I was equally stunned to here this. Once a shift
>> >> character is seen that identifies that following characters as unicode,
>> >> then none of the bytes that are part of that unicode "character" will
>be
>> >> less than 128! This is what makes UTF-8 work. All characters <128 are
>> >> always ASCII!
>> >
>> >Great, that means even ouputters relying on that are compatible with UTF8
>> >(CompilerOutputter which as some line wrapping code).
>>
>> I want to make sure that this question is properly understood. If you are
>> just searching for '\n' in a stream or string then that will work fine.
>> If you are looking to insert '\n' (or any other characters) at some
>> position then things get complicated.
>
>You understood the question well. It's me who did not have all my head when
>I answered. Indeed I insert '\n', which make it not UTF8 compatible. This is
>an issue that will need to be addressed in the future.

This is where I have hopes of the standards committee adding something to
the standard to address unicode support. Presently there is no standard
API for identifying character boundaries. I suspect its not that hard to
do by hand but its also something that most OS's provide an interface for.

Maybe some kind of abstraction. All you need to do is find a place that's
safe to insert characters; that's between characters not inter-character.
I think most OS's provide that capability, so this would be an OS
dependent abstraction.

 ...Duane

-- 
"If tyranny and oppression come to this land, it will be in the
guise of fighting a foreign enemy."              - James Madison