From: <no...@so...> - 2001-04-24 20:28:55
|
Bugs item #418645, was updated on 2001-04-24 13:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=418645&group_id=10894 Category: Environment Variables Group: 8.3.3 Status: Open Resolution: None Priority: 5 Submitted By: Markus Kuhn (mkuhn) Assigned to: Nobody/Anonymous (nobody) Summary: Initial encoding selected incorrectly Initial Comment: The function unix/tclUnixInit.c:TclpSetInitialEncodings contains an ugly hack to guess from the locale name the multibyte encoding currently used on a Unix system. This might work in some of the few special cases listed in the provided table, but it fails badly in general. For example under Linux (glibc 2.2), the locale de_DE uses ISO 8859-1, the locale de_DE@euro uses ISO 8859-15, and the locale vi_VN uses UTF-8. None of these is covered by your table. Just extending localeTable[] is not the solution here, because manufacturers change the encodings of locales sometimes. Unix has an X/Open standardized API function to determine the character set of the current locale! I suggest that you drop the entire environment variable parsing and table mechanics in TclpSetInitialEncodings. Instead simply first call setlocale(LC_NUMERIC, "C"); such that the C library sets the locale, then call nl_langinfo(CODESET) (on all platforms where langinfo.h is available) which will return the name of the now used encoding. This will be a string such as ISO-8859-1 ISO-8859-15 UTF-8 EUC-JP KOI8-R SJIS The command "locale -m" will print you on a system a list of all available encodings. These strings are unfortunately not strictly standardized and you will still need a table to map these encoding names into those used by TCL, but the return value of nl_langinfo(CODESET) is a far better starting point to find the currently used encoding than the locale name. On some systems (including all with glibc 2.2 for instance) you do not even have to determine the encoding from the output of nl_langinfo(CODESET). The iconv function will provide you a comprehensive conversion service to convert whatever encoding nl_langinfo(CODESET) identified into "UTF-8". The matter is of some urgency, because SuSE Linux is going to switch the default locales of most European Union countries to ISO 8859-15 (for support of the Euro symbol) soon, and then you assumption that ISO 8859-1 is a good default will fail for millions of Linux users. X/Open spec for nl_langinfo: http://www.opengroup.org/onlinepubs/7908799/xsh/langinfo.h.html ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=418645&group_id=10894 |