> As tiny as possible: testenc.cpp
Why C++? For sample code like this, plain C would be even simpler.
> ::MessageBox(NULL, "Adiós", "MyTestBox", MB_OK);
> return 0;
> the second parameter
> to MessageBox is a five chars
What do you mean with "chars"? Not the same as C does. It is five
*characters*, but six C chars (bytes) (in UTF-8).
> On my XP I see six [chars], not five.
(You mean *characters*.) Which is as expected, as the MessageBoxA()
function interprets the char string in system codepage, not UTF-8. If
you want to use non-ASCII in string literals passed to functions not
under your control, like MessageBoxA(), their encoding should
obviously match what these functions expect.
What I asked you to give a concrete example for was what you claimed
about UTF-8 char strings being magically converted to UTF-16 wchar_t
strings at some stage, with an earlier version of MinGW.
For minimum fuss, it's simplest to just don't use non-ASCII in string
literals in source files, as the way different compilers interpret
them might vary. Also, passing hardcoded non-ASCII strings to "A"
system codepage APIs means that if your program is intended to
eventually have other users than yourself, the program will use the
wrong strings when somebody runs it on a machine with a different
system codepage than yours. For instance, I hope it is not too
far-fetched to imagine somebody might run your program in Greece but
still want to see the UI hardcoded in Spanish.
In this case I would suggest to use the wide character API, with a hex
code for the non-ASCII code point:
main (int argc, char **argv)
MessageBoxW (NULL, L"Adi\xf3s", L"MyTextBox", MB_OK);
Or even L"Adi\u00f3s" , but that requires C++ or C99 in MinGW's case.
(It also works with MSVC9 it seems.)
Yeah, clearly " L"Adi\xf3s" is less readable than "Adiós". So if you
have a lot of such non-ASCII string literals, and you insist on using
UTF-8 in your source code, you need some helper functions that convert
from UTF-8 to UTF-16 and pass those then to the "W" APIs.