From: Yongwei Wu <wuy...@gm...> - 2011-09-01 01:54:35
|
On 31 August 2011 21:41, Charles Wilson <cwi...@us...> wrote: > On 8/31/2011 8:51 AM, Yongwei Wu wrote: >> On 31 August 2011 12:40, Xiaofan Chen wrote: >>> On Wed, Aug 31, 2011 at 9:17 AM, Yongwei Wu wrote: >>>> Or we advocate the simple but probably objectionable fix in many of >>>> the open-source projects: >>>> >>>> #ifdef _WIN32 >>>> setlocale(LC_ALL, "C"); >>>> #endif >>> >>> I think this is the way to go. Just an example here. >> >> In fact, setting LC_CTYPE to "C" is sufficient. >> >> If there is not a better way, please spread the word :-): >> setlocale(LC_ALL, "") or setlocale(LC_CTYPE, "") is bad on Windows >> platforms. (Unless you have multibyte characters in the format >> strings, which I assume no one does today.) > > So, in order to avoid a problem that occurs when Windows' Regional > Settings specifies a language whose charset is multibyte, you suggest > that we hardcode ALL applications to disallow ANY locale other than "C"? > > So; no German. No Italian. No Spanish. No French. No Russian. As I later corrected, the key point is LC_CTYPE. It is more about the default encoding for the program. "C" means No Conversion, and really is expected by most programs. It definitely does not mean "no German", etc. Following the same logic, I can add "no Chinese" too, which is apparently not the case, but to the contrary. Some functions *are* affected, like isalpha, tolower, etc. I think the locale-specific behaviour is more likely to be required in some special-locale programs, like German-only or Russian-only. In that case, I think setting locale to "C" is bad. However, it seems the "C" version functions are often good enough for programs that are designed to work in different locales. The following code piece is the beginning of Vim init_locale: ------------------------------------ setlocale(LC_ALL, ""); # ifdef FEAT_GUI_GTK /* Tell Gtk not to change our locale settings. */ gtk_disable_setlocale(); # endif # if defined(FEAT_FLOAT) && defined(LC_NUMERIC) /* Make sure strtod() uses a decimal point, not a comma. */ setlocale(LC_NUMERIC, "C"); # endif # ifdef WIN32 /* Apparently MS-Windows printf() may cause a crash when we give it 8-bit * text while it's expecting text in the current locale. This call avoids * that. */ setlocale(LC_CTYPE, "C"); # endif ------------------------------------ > I don't think that's the best solution. Not always. But probably it works in most cases. > Bruno Haible has posted an implementation of setlocale that can be used > to avoid this problem. Instead of recompiling all MinGW programs to force: > setlocale(LC_ALL, "C"); > we should recompile all MinGW programs to link against Bruno's > implementation of setlocale. This version, when called with > setlocale(whatever, ""); > will FIRST check the relevant environment variables, BEFORE delegating > to msvcrt.dll's behavior (which is to check Window's Regional Settings). That is irrelevant. As long as any non-"C" values are passed to the MSVCRT setlocale, putchar will behave as I described. > This way, if your Windows Regional Settings are set to some problematic > multibyte locale, YOU can override that behavior for MinGW programs by > setting > LANG=en > (or C or de or it or es) in your environment. This way, fixing the > problem for multibyte regions doesn't destroy the ability of italian- or > spanish-speakers to operate in /their/ language. Even when the interface language is English, people still want the program to be able to handle multi-byte characters, right? Think about the programs I mentioned: head, tail, ls, iconv, etc. > All this solution requires is a new package that provides a library with > Bruno's setlocale (say, libposix [*]) and for each existing package, > when updated, to add -lposix to its LDFLAGS. I would like to see it, if it fixes the issue. However, I do not see it can help, if the putchar behaviour remains the same. Best regards, Yongwei -- Wu Yongwei URL: http://wyw.dcweb.cn:8001/ (Temporary) |