Re: [Mingw-w64-public] wchar_t vs char

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Grasp this sentence: implementation dependent.

wchar_t is wide char, it's made to imply that each representable character
can take more than one byte to encode a character. It was created before
utf8 got mainstream. And, as there were competing encodings (UCS-2 fixed
length vs what ended up being UTF-16 with variable length encoding), the
standardization committees didn't pick one.

While UTF8 is compatible with C encoding of strings (null termination), it
uses variable length encoding (so strlen(buffer) doesn't work for utf8 as
it would for asc-ii, because it counts bytes, not characters).  It's
otherwise a superset of ASC-II like most "national" encodings microsoft
uses for their routines ending in "A" (as opposed to "W", for "Wide").

Only UTF32 has fixed length encoding (I overheard people saying it's not
exactly true even there, because it has unrepresentable code points, but I
never confirmed).

Microsoft started using a fixed length 2-byte encoding  (UCS-2 IIRC) which
was a more or less subset of utf16, but it swapped it (with bugs being
slowly fixed over the years) by utf16 because by the time utf16 was
standardized, people had already realized that 16 bits doesn't cover every
symbol in every language, thus utf16 can take more than one 16-bit sequence
code to represent a visible character (a symbol).

It was natural for Microsoft to standardize MSVC's wchar_t to 16bit since
it supported it natively in it's revamped UCS2 (now UTF-16) API, but many
other platforms have wchar_t as UTF32.

Nowadays you have standardized converters between each of these encodings
even in the C++ library, but as conversion is expensive (and so is storage,
so sometimes you have to convert, depending on the data, i.e. long english
pure text use the same amount of data in utf8 than it would use in plain
ascii, and in fact, it would be the same: utf8 is an ASCII superset).

I feel the pain, trust me: I have a program written in C++ that interfaces
with windows (via 16-bit ucs-16), a huge third-party code base using
UTF-32, and gtk (which uses utf8 for everything). If it wasn't so easy
(well, it isn't, but you get it over time) to use type safety in C++, I
would be passing the wrong string type around more times than I could
count. I ended up using a pre-C++11 converter to adapt the strings on
demand, but today it would be easier.

Btw, I always swim against the mainstrean and when programming windows, I
normally follow these advices (among others):

http://utf8everywhere.org/

But I *can't* and I *won't* suggest you or anyone else do the same blindly;
There are reasons pro and against these tips; For me it was a win, but I
can see why some people/organizations would pay a price which is too high
for a not worthy return.

Em ter, 30 de jun de 2015 às 14:04, LRN <lr...@gm...> escreveu:

> On 30.06.2015 19:44, pa...@ar... wrote:
> > I have been reading that wchat_t, and therefore wstring, is neither
> UTF-8 nor a UTF-16 character set. So, what is wstring good for then?
> Whether it's UTF-16 or UCS-2 depends on the implementation of the library
> that handles wstring.
>
> Sources, which i can't remember right now, claim that MS libraries were
> UCS-2 initially, then later quietly converted to UTF-16 under the hood.
>
>
> --
> O< ascii ribbon - stop html email! - www.asciiribbon.org
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Mingw-w64-public mailing list
> Min...@li...
> https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
>

Re: [Mingw-w64-public] wchar_t vs char

A complete runtime environment for gcc

Re: [Mingw-w64-public] wchar_t vs char