mined-editor Mailing List for MinEd Text Editor (Page 2)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> >> Weird problems can also occur when the settings of LANG, and the LC_*
> >> variables are inconsistent. For example something like
> >> 
> >>      export LANG=de_DE@euro
> >>      export LC_CTYPE=de_DE.UTF-8
> >> 
> >> (and all other LC_* variables unset) is not allowed.  I often had
> >> reports from users who did run into problems because of this.  And
> >> often they didn't set these illegal combinations manually on their
> >> own, rather they were caused by such scripts trying to be helpful by
> >> fiddling with the LC_* variables but unfortunately didn't get it
> >> right.
> > I wonder what sort of problems would arise and why.

> Anything can happen. The POSIX standard (``The Open Group Base
> Specifications Issue 6'') says about this:

>     If different character sets are used by the locale categories, the
>     results achieved by an application utilizing these categories are
>     undefined.

>     (see http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap07.html) 

> And weird things really *do* happen. For example, I remember a
> bug-report against sort because of this.

>      https://bugzilla.novell.com/show_bug.cgi?id=41506
Interesting bug that demonstrate the poor design of the locale 
mechanism which should rather work transparently whenever possible, 
without every tool having to adapt its code to special cases.

> > Isn't the LC_CTYPE locale category (set by either LC_ALL, LC_CTYPE, or LANG) 
> > the only one which makes sense to have an encoding suffix according to 
> > the locale documentation?

> I won't argue "makes sense". I agree with you here, I also have the
> feeling that the system is slightly weird here. But we cannot help it,
> it is a standard and we have to obey.

> Although LC_CTYPE is the variable responsible for the encoding, you
> *must* use the same encoding in all other LC_* variables you set.
This will have to be accepted but it leaves room for some extension...

> > Actually, however, the locale concept does not provide sufficient 
> > encoding configuration features:

> > * It cannot distinguish between terminal encoding and preferred data 
> >   encoding. For mined, I decided to slightly interpret (or misuse, 
> >   if you want)

> misuse, certainly misuse! This is against the POSIX standard.
Not quite, see below.

> >   the locale mechanism by allowing the following:
> >   	LC_CTYPE=something.UTF-8
> >   	LANG=something.gb18030
> >   This would tell mined that the preferred encoding when editing text 
> >   is GB18030 while leaving the LC_CTYPE category indicating a UTF-8 
> >   terminal, so other applications are not confused, and CJK files 
> >   can easily be worked on in a UTF-8 terminal (there are options for 
> >   this in mined, too, and of course auto-detection...).

> Having options for this in mined is the only possibility here.  You
> cannot use the locale environment variables against the POSIX
> standard.
I'm actually not doing that.
The POSIX requirement you quoted says that different character sets 
must not be used by the locale categories.
But by rules of priority of the variables affecting each category, 
LANG doesn't affect anything if all other categories are set.
So, when the example is modified slightly:
	LC_ALL=something.UTF-8
	LANG=something.gb18030
this does not violate the POSIX locale standard. (For the POSIX 
locale mechanism, the value of LANG would have no effect here.)

Maybe a little picky here, but this way at least one important 
missing configuration feature (distinguishing terminal encoding from 
data encoding) can be achieved this way if an application likes.

> > * Even worse (of the locale mechanism), it cannot specify an encoding 
> >   independently of the language settings.
> >   It would actually be sufficient to set LC_CTYPE=".UTF-8" in order 
> >   to indicate terminal encoding, and e.g. set LANG=whatever for the 
> >   preferred language setting.

> I agree that this would be more logical. Several different encodings
> in LC_* are forbidden anyway, why do we have to repeat the same
> encoding in all LC_* variables? It would be logical if all
> other LC_* variables use the encoding of LC_CTYPE automatically.
> ...

> >   Then with n languages and m encodings supported on a system, it 
> >   would not be necessary to install n*m sets of locale data (in 
> >   theory, practically many are obviously left out) but only n+m.

> Theoretically one could use other encodings for German as well,
> i.e. de_DE.GB18030 would be theoretically possible. But such locales
> are never installed.
Sure, I said "many are obviously left out".
> And most encodings wouldn't work for German
> anyway, i.e. de_DE.SJIS could not work because SJIS has no umlauts.
Actually, Shift-JIS X0213 does maintain umlauts. Try it with mined, 
enter them in the terminal, ESC u will reveal the encoding. Maybe a 
later extension of Shift-JIS; I took the table from libiconv.
> So it is much less than n*m.
Sure, but locale installation is far too complicated for users 
(I don't know myself how it works because there is no decent documentation) 
- and it seems it's not even possible if you're not root - 
so why should there be any arbitrary restriction at all that only 
raise trouble?

> >> Maybe I will add that font to the SuSE xterm package.
[meaning the 20x20 fonts]
Yes, please do that.

> >> > I hope some people will find it useful to have this script available 
> >> > in /usr/bin.
> >> 
> >> It might be useful on legacy systems, but probably I should just omit
> >> it in mined .rpm-packages for SuSE Linux >= 9.1 because it is not
> >> really helpful there.
> > Then at least you should also filter out uxterm from the xterm
> > package.

> I'm already thinking about that ...

> > But what about the automatic font setup feature? I have not seen the 
> > SuSE 10 default font configuration yet, but isn't it useful to have 
> > a script that tries to achieve the best Unicode benefit?

> I think xterm should already be setup by default to use the best
> Unicode fonts available.

> We already changed our xterm font setup in that direction.
> Unfortunately we could not go all the way because some backwards
> oriented users insisted on keeping the old default fonts.  They would
> not accept the slightest change in font style even if it is a huge
> improvement towards better Unicode support by default.
There we have a good reason to provide a script that overcomes 
the restrictions of traditional default fonts (probably the tiny ones) 
that have not been extended to cover Unicode.

> > It's only an option, anyway, everyone may call xterm directly if
> > that works sufficiently.

> I'm just afraid it will generate extra bugs.
I'm doing the best to avoid this, see below.

> >> The updated version is attached. Thank you very much for the review.
>     mfabian@magellan:/tmp$ locale
>     LANG=ja_JP.UTF-8
>     LC_CTYPE="ja_JP.UTF-8"
>     ...
>     LC_PAPER="ja_JP.UTF-8"
>     LC_ALL=
>     mfabian@magellan:/tmp$ LANG=en_US.ISO-8859-1 LC_PAPER=de_DE@euro ./uterm
>     20x20 font not found, using 9x18 with 18x18.

> Now in the xterm which started:

>     mfabian@magellan:/tmp$ locale
>     LANG=en_US.UTF-8
>     LC_CTYPE="en_US.UTF-8"
>     ...
>     LC_PAPER=de_DE@euro
>     LC_ALL=
>     mfabian@magellan:/tmp$ 

> Illegal combination of UTF-8 and ISO-8859-1 encoding because LC_PAPER
> is still set to de_DE@euro -> trouble ahead.
Well, in this case you explicitly caused the "illegal combination" 
yourself with the above quoted command line settings of LANG and LC_PAPER 
in a UTF-8 locale environment, so any trouble caused would not be 
caused by uterm.

I think the uterm script has the following properties:
* It gives the user the opportunity to start an xterm which can display 
  a maximum of Unicode "out-of-the-box", in two respects:
  * Enforcing UTF-8 even if the user's environment is misconfigured.
    (not needed for SuSE)
  * Choosing a most suitable font.
    (also useful for SuSE default configuration, or with user's own 
    configuration active that might address traditional non-sufficient 
    fonts)
* It avoids locale mismatch trouble (being debugged thanks to your review).
* It serves as an auxiliary script for umined (which shall be installed 
  in /usr/bin) or other scripts (:) that invoke an application in a 
  separate terminal window.

Kind regards,
Thomas Wolff

2005	Jan	Feb	Mar (1)	Apr (5)	May	Jun (1)	Jul	Aug (5)	Sep (9)	Oct (4)	Nov	Dec
2006	Jan	Feb	Mar (1)	Apr (1)	May	Jun	Jul	Aug	Sep	Oct (2)	Nov (1)	Dec (4)
2007	Jan (3)	Feb (5)	Mar	Apr (4)	May (1)	Jun	Jul (3)	Aug	Sep (3)	Oct	Nov	Dec
2008	Jan	Feb	Mar (2)	Apr	May	Jun	Jul	Aug (4)	Sep (4)	Oct	Nov	Dec
2009	Jan	Feb	Mar	Apr (1)	May (1)	Jun	Jul (1)	Aug	Sep	Oct (2)	Nov	Dec
2010	Jan	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2011	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (1)	Dec
2012	Jan (1)	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May	Jun (1)	Jul (1)	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

mined-editor Mailing List for MinEd Text Editor (Page 2)

mined-editor — Discussion, questions and requests for the text editor mined