From: Robert H. <Rob...@gm...> - 2013-10-01 20:45:00
|
Hi Yongwei Wu, Yongwei Wu <wuyongwei@...> writes: > > Hi gurus, > > I think it started when I used GCC 4. When I input gcc on the command > line, I got something very weird: > > gcc: ??óDê?è????t > > My initial response was that GCC tried to output UTF-8 to a Chinese > command prompt. I simply dismissed the issue with "set LANG=en". > > When I revisited this issue today, I found it was really very > different. I could not correct the output with iconv. I tried this and > wanted to check more carefully: > > gcc 2> a.txt > > To my great surprise, it contains the correct Chinese message that is > in the RIGHT encoding (CP936): > > gcc: 没有输入文件 (the Chinese equivalent for "no input files") > > I made some experiments. I found that if I treated it as Latin-1, it > was: > > gcc: ûÓÐÊäÈëÎļþ > > Now let us compare with the beginning error message again: > > gcc: ??óDê?è????t > > I noticed some something in common: > > à --> ? > » --> ? > Ó --> ó > Ð --> D > Ê --> ê > ä --> ? > È --> è > ë --> ? > Î --> ? > Ä --> ? > ¼ --> ? > þ --> t > > Apparently the text is good somewhere, but treated as Latin-1 and > downgraded to Chinese-compatible characters when output to the screen > (but not when redirected to a file). > > It did not took me long to get the wrong message with this test > program: > > #include <locale.h> > #include <stdio.h> > > char msg[] = "\303\273\323\320\312\344\310\353\316\304\274\376"; > > void Test1() > { > printf("Test1: "); > char* ptr = msg; > while (*ptr) { > putchar(*ptr); > ++ptr; > } > putchar('\n'); > } > > void Test2() > { > printf("Test2: "); > puts(msg); > } > > int main() > { > Test1(); > Test2(); > setlocale(LC_CTYPE, "Chinese_China.936"); > Test1(); > Test2(); > setlocale(LC_CTYPE, "Chinese_China.1252"); > Test1(); > Test2(); > } > > I got: > > Test1: 没有输入文件 > Test2: 没有输入文件 > Test1: > Test2: 没有输入文件 > Test1: ??óDê?è????t > Test2: ??óDê?è????t > > Since it was a surprise, I copied the resulting exe to an XP machine, > and it output the correct Chinese text on all six lines! > > My regional settings have a few quirks, and basically > setlocale(LC_CTYPE, "") is equivalent to setlocale(LC_CTYPE, > "Chinese_China.1252"). Normal Windows 7 users are more likely to see > the setlocale(LC_CTYPE, "Chinese_China.936") result. I.e., they cannot > see the messages that are supposed to be Chinese at all. > > I feel bad for my Chinese friends, and wish they all used Windows XP > or knew how to set LANG=en. > > I am not familiar with the GCC source, and do not know how easy it is > to make a fix. Apparently, one of the following should be possible: > > * Remove any use of setlocale(LC_ALL, "") or setlocale(LC_CTYPE, ""). > * Add a setlocale(LC_CTYPE, "C") after setlocale(LC_ALL, ""). (BTW, > this was actually a patch for Vim on Windows, to fix a related but > different locale issue about multibyte characters.) > * Use printf or puts, but NOT putchar/putc/fputc. (With the last fix, > I would still have the same problem, but most Chinese users would be > more lucky.) > > Any comments, gurus? > > Oh, my GCC version is: > > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/4.5.2/lto-wrapper.exe > Target: mingw32 > Configured with: ../gcc-4.5.2/configure > --enable-languages=c,c++,ada,fortran,objc,obj-c++ > --disable-sjlj-exceptions --with-dwarf2 --enable-shared > --enable-libgomp --disable-win32-registry --enable-libstdcxx-debug > --enable-version-specific-runtime-libs --disable-werror > --build=mingw32 --prefix=/mingw > Thread model: win32 > gcc version 4.5.2 (GCC) > > My OS version is Windows 7 Enterprise x64 Edition 6.1.7600. > > Best regards, > > Yongwei > Well, just some thoughts. 1) All CodePages supported by Windows are listed in the "Go Global Developer Center" [1] You see here: UTF-8 is not supported as codepage for input/output in CMD.exe . 2) If you want to print international characters to Windows console, than you have these options: a) Write a unicode-version of your program. That means define both symbols UNICODE and _UNICODE and use the w*-functions (wprintf, std::wcout, etc) and use _wsetlocale. (It exists a Unicode-Layer for Win9x,ME [2], [3]) b) Write a multibyte-version of your program. That means define define the symbol _MBCS and use the "normal" singlebytecharacter functions (printf and friends) and use setlocale followed by the call _setmbcp(_MB_CP_LOCALE); [4] c) Write a singlebyte-version of your program, that means you neigther define symbols (UNICODE and _UNICODE) nor _MBCS. Use the "normal" singlebytecharacter functions (printf and friends) and use setlocale. But you can't create any valid output for a DoubleByteCharachterString in this program. d) Use TCHAR and t*-function with correct defined symbols. 3) On systems where OEM-Codepage (OCP) and ANSI-Codepage (ACP) differ the console-output of hardcoded strings depends on the source-code fileencoding. If your editor saves with ACP, than you must convert the string to OCP first, and after that you can print to console. 4) Asume that your program change user's current OEM-codepage to your wanted OEM-Codepage than it depends on users console-font if he didn't get your expected output. 5) To be perfect, I think your program must 1: define correct symbols in sourcecode. 2: load a corresponding console-font 3: change console's current OEM-Codepage i.e. to 936 4: print output. (5: scan keyboard input: well inside a german localized Windows version, I couldn't find any posibility to insert chinese symbols into console via keyboard at all) Best regards, Robert Hartmann (germany) [1] Codepages http://msdn.microsoft.com/en-us/goglobal/bb964653 [2] MS Layer for Unicode (MSLU) Win9xME http://msdn.microsoft.com/en-us/goglobal/bb688166 [3] Download MSLU http://www.microsoft.com/en-us/download/details.aspx?id=4237 [4] _setmbcp http://msdn.microsoft.com/en-us/library/883tf19a |