Re: [mined-editor] Re: uterm script
Brought to you by:
thomaswolff
|
From: Mike F. <mf...@su...> - 2005-10-05 23:02:39
|
Thomas Wolff <mi...@to...> =E3=81=95=E3=82=93=E3=81=AF=E6=9B=B8=E3=81=8D= =E3=81=BE=E3=81=97=E3=81=9F: >> > * It cannot distinguish between terminal encoding and preferred data= =20 >> > encoding. For mined, I decided to slightly interpret (or misuse,=20 >> > if you want) > >> misuse, certainly misuse! This is against the POSIX standard. > Not quite, see below. > >> > the locale mechanism by allowing the following: >> > LC_CTYPE=3Dsomething.UTF-8 >> > LANG=3Dsomething.gb18030 >> > This would tell mined that the preferred encoding when editing tex= t=20 >> > is GB18030 while leaving the LC_CTYPE category indicating a UTF-8=20 >> > terminal, so other applications are not confused, and CJK files=20 >> > can easily be worked on in a UTF-8 terminal (there are options for= =20 >> > this in mined, too, and of course auto-detection...). > >> Having options for this in mined is the only possibility here. You >> cannot use the locale environment variables against the POSIX >> standard. > I'm actually not doing that. > The POSIX requirement you quoted says that different character sets=20 > must not be used by the locale categories. > But by rules of priority of the variables affecting each category,=20 > LANG doesn't affect anything if all other categories are set. > So, when the example is modified slightly: > LC_ALL=3Dsomething.UTF-8 > LANG=3Dsomething.gb18030 > this does not violate the POSIX locale standard. (For the POSIX=20 > locale mechanism, the value of LANG would have no effect here.) > > Maybe a little picky here, but this way at least one important=20 > missing configuration feature (distinguishing terminal encoding from=20 > data encoding) can be achieved this way if an application likes. But this is completely nonstandard, no other application but mined is doing something like this. And it is very misleading as it makes the user believe that using different encodings in these variables is OK. But it is not. You are right that LANG is ignored if LC_ALL is set, but it is still misleading. MINED_PREFERRED_FILE_ENCODING would be ignored by the POSIX locale mechanism as well and not only if LC_ALL is set. I still think a special mined option, be it via a mined specific environment variable or a config file option is much clearer than abusing the POSIX locale system in weird and funny ways. >> And most encodings wouldn't work for German >> anyway, i.e. de_DE.SJIS could not work because SJIS has no umlauts. > Actually, Shift-JIS X0213 does maintain umlauts. Try it with mined,=20 > enter them in the terminal, ESC u will reveal the encoding. Maybe a=20 > later extension of Shift-JIS; I took the table from libiconv. Yes, this is a later extension of shift_jis. The old shift_jis has no umlauts: mfabian@magellan:~$ echo -n =C3=B6 | iconv -f utf-8 -t sjis >/dev/null iconv: illegal input sequence at position 0 mfabian@magellan:~$ only shift_jisx0213 has: mfabian@magellan:~$ echo -n =C3=B6 | iconv -f utf-8 -t shift_jisx0213 >/d= ev/null mfabian@magellan:~$ >> So it is much less than n*m. > Sure, but locale installation is far too complicated for users=20 > (I don't know myself how it works because there is no decent documentat= ion)=20 > - and it seems it's not even possible if you're not root - It is possible if you are not root. You can install additional locales in your home directory using the command "localedef". You are right though that it is not commonly known by ordinary users how to do that. >> >> The updated version is attached. Thank you very much for the review= . >> mfabian@magellan:/tmp$ locale >> LANG=3Dja_JP.UTF-8 >> LC_CTYPE=3D"ja_JP.UTF-8" >> ... >> LC_PAPER=3D"ja_JP.UTF-8" >> LC_ALL=3D >> mfabian@magellan:/tmp$ LANG=3Den_US.ISO-8859-1 LC_PAPER=3Dde_DE@eu= ro ./uterm >> 20x20 font not found, using 9x18 with 18x18. > >> Now in the xterm which started: > >> mfabian@magellan:/tmp$ locale >> LANG=3Den_US.UTF-8 >> LC_CTYPE=3D"en_US.UTF-8" >> ... >> LC_PAPER=3Dde_DE@euro >> LC_ALL=3D >> mfabian@magellan:/tmp$=20 > >> Illegal combination of UTF-8 and ISO-8859-1 encoding because LC_PAPER >> is still set to de_DE@euro -> trouble ahead. > Well, in this case you explicitly caused the "illegal combination"=20 > yourself with the above quoted command line settings of LANG and LC_PAP= ER=20 > in a UTF-8 locale environment, so any trouble caused would not be=20 > caused by uterm. Sorry I wanted to write LANG=3Den_US.ISO-8859-1 LC_PAPER=3Dde_DE.ISO-8859-1 ./uterm which is perfectly fine because the encodings are the same. But after starting ./uterm there is a conflict because uterm changes only one environment variable. To avoid such conflicts, uterm should try to change all of the LANG and LC_* variables which were set before to non-UTF-8 locales to UTF-8 locales, not only one. I.e. in the above case the uterm script should change LANG to en_US.UTF-8 and LC_PAPER to de_DE.UTF-8. On SuSE Linux you can be sure that all locales also exist in UTF-8. I.e. if there is a xx_YY.something then there is also a xx_YY.UTF-8. On other systems you cannot be sure. You wrote yourself that on some systems only en_US.UTF-8 exists. That means in the above case you should try to set LC_PAPER to de_DE.UTF-8 if possible and if not fallback to en_US.UTF-8. You could improve your script by checking LANG and all LC_* variables and try to set all of them which were set when uterm was started to the respective UTF-8 locales if possible and if not use another UTF-8 locale which exists as a fallback. On glibc systems, you can easily check whether a certain UTF-8 locale exists with the "locale" program: mfabian@magellan:~$ LC_CTYPE=3Den_US.UTF-8 locale charmap UTF-8 mfabian@magellan:~$ exists because it returns UTF-8. mfabian@magellan:~$ LC_CTYPE=3Daa_BB.UTF-8 locale charmap locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968 mfabian@magellan:~$ does not exists because it does *not* return UTF-8. I'm not sure how this can be done in a portable way. Maybe the "locale charmap" command is standardized, but I'm not sure at the moment. > I think the uterm script has the following properties: > * It gives the user the opportunity to start an xterm which can display= =20 > a maximum of Unicode "out-of-the-box", in two respects: > * Enforcing UTF-8 even if the user's environment is misconfigured. > (not needed for SuSE) > * Choosing a most suitable font. > (also useful for SuSE default configuration, or with user's own=20 > configuration active that might address traditional non-sufficient=20 > fonts) yes, maybe. --=20 Mike FABIAN <mf...@su...> http://www.suse.de/~mfabian =E7=9D=A1=E7=9C=A0=E4=B8=8D=E8=B6=B3=E3=81=AF=E3=81=84=E3=81=84=E4=BB=95=E4= =BA=8B=E3=81=AE=E6=95=B5=E3=81=A0=E3=80=82 |