From: Tor L. <tm...@ik...> - 2010-04-19 15:38:28
|
> As tiny as possible: testenc.cpp Why C++? For sample code like this, plain C would be even simpler. > ::MessageBox(NULL, "Adiós", "MyTestBox", MB_OK); > return 0; > the second parameter > to MessageBox is a five chars What do you mean with "chars"? Not the same as C does. It is five *characters*, but six C chars (bytes) (in UTF-8). > On my XP I see six [chars], not five. (You mean *characters*.) Which is as expected, as the MessageBoxA() function interprets the char string in system codepage, not UTF-8. If you want to use non-ASCII in string literals passed to functions not under your control, like MessageBoxA(), their encoding should obviously match what these functions expect. What I asked you to give a concrete example for was what you claimed about UTF-8 char strings being magically converted to UTF-16 wchar_t strings at some stage, with an earlier version of MinGW. For minimum fuss, it's simplest to just don't use non-ASCII in string literals in source files, as the way different compilers interpret them might vary. Also, passing hardcoded non-ASCII strings to "A" system codepage APIs means that if your program is intended to eventually have other users than yourself, the program will use the wrong strings when somebody runs it on a machine with a different system codepage than yours. For instance, I hope it is not too far-fetched to imagine somebody might run your program in Greece but still want to see the UI hardcoded in Spanish. In this case I would suggest to use the wide character API, with a hex code for the non-ASCII code point: #include <windows.h> int main (int argc, char **argv) { MessageBoxW (NULL, L"Adi\xf3s", L"MyTextBox", MB_OK); return 0; } Or even L"Adi\u00f3s" , but that requires C++ or C99 in MinGW's case. (It also works with MSVC9 it seems.) Yeah, clearly " L"Adi\xf3s" is less readable than "Adiós". So if you have a lot of such non-ASCII string literals, and you insist on using UTF-8 in your source code, you need some helper functions that convert from UTF-8 to UTF-16 and pass those then to the "W" APIs. --tml |