From: Giel v. S. <me...@mo...> - 2008-04-18 13:23:34
|
Yongwei Wu schreef: > 2008/4/17 Немос <nem...@ya...>: >> Yongwei Wu пишет: >>>> In VS this code work fine. >>> I do not understand why you said "In VS this code work fine". On the >>> contrary, my test with Visual Studio 2003/2005 shows that the output >>> is exactly the same. >> May be not install needed fonts for russian. I test code >> with VS 2008, source file in cp1251. > > I saw the expected output, in Cyrillic (with GCC, MSVC 7.1/8.0). I am > not sure about 2008. It is a surprising result. I do not understand > why setlocale should change the output. ANSI strings should not > change when the locale changes. AFAIK the strings themselves should indeed not change when using different locale settings, the way how they're interpreted might validly change though. To my knowledge this _should_ affect the way how the standard I/O functions work on non-binary output. I haven't bothered to check the ISO C89 or C99 standards, but according to "man fputws" (a manpage for "putws" was unavailable, though it are essentially the same functions, except that putws implicitly uses stdout): > int fputws(const wchar_t *ws, FILE *stream); > ... > The behavior of fputws() depends on the LC_CTYPE category of the current locale. This leads me to believe that it's actually quite valid that the behaviour of putws() changes when LC_CTYPE changes. >>> I know one way that will output different strings in VS. If you use >>> wchar_t and _putws, things will be different, since _putws will >>> convert the encoding from UTF-16 to the console code page. You should >>> save the source file in UTF-16, in this case; otherwise the result is >>> unreliable, since you cannot tell MSVC the source encoding, and the >>> system legacy encoding is always used, which may or may not be what >>> you want. >> I need that code work in Linux and Windows, and source in >> utf-8. But VS (compiler) don`t understand source in utf-8 >> without bom, and gcc don`t like bom. >> So I use mingw for win32 build. > > Are you sure GCC works with UTF-8? My test shows that GCC always > interpret the input in Latin-1. Of course BOM is invalid in this > case. AFAIK GCC doesn't concern itself with encodings at all. Too my knowledge a C (or C++) compiler shouldn't (and GCC doesn't) attempt to interpret the bytes in a string and as such it shouldn't attempt to work with any specific encoding on the string. This basically means that you can use whatever encoding you want, as long as the functions that work on that string can work with that encoding. So AFAIK this isn't an issue of a compiler but an issue of the specific libraries you use (in this case the standard C library). -- Giel |