mined-editor Mailing List for MinEd Text Editor (Page 2)
                
                Brought to you by:
                
                    thomaswolff
                    
                
            
            
        
        
        
    You can subscribe to this list here.
| 2005 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           (1)  | 
        
        
        
        
          Apr
           (5)  | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           (1)  | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           (5)  | 
        
        
        
        
          Sep
           (9)  | 
        
        
        
        
          Oct
           (4)  | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           (1)  | 
        
        
        
        
          Apr
           (1)  | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           (2)  | 
        
        
        
        
          Nov
           (1)  | 
        
        
        
        
          Dec
           (4)  | 
        
      
| 2007 | 
          Jan
           (3)  | 
        
        
        
        
          Feb
           (5)  | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           (4)  | 
        
        
        
        
          May
           (1)  | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           (3)  | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           (3)  | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2008 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           (2)  | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           (4)  | 
        
        
        
        
          Sep
           (4)  | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2009 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           (1)  | 
        
        
        
        
          May
           (1)  | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           (1)  | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           (2)  | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2010 | 
          Jan
           | 
        
        
        
        
          Feb
           (2)  | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2011 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           (1)  | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           (1)  | 
        
        
        
        
          Dec
           | 
        
      
| 2012 | 
          Jan
           (1)  | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           (1)  | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           (1)  | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2013 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           (1)  | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2014 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           (1)  | 
        
        
        
        
          Jul
           (1)  | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2015 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           (1)  | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-10-05 16:38:28
      
     
   | 
> >> Weird problems can also occur when the settings of LANG, and the LC_* > >> variables are inconsistent. For example something like > >> > >> export LANG=de_DE@euro > >> export LC_CTYPE=de_DE.UTF-8 > >> > >> (and all other LC_* variables unset) is not allowed. I often had > >> reports from users who did run into problems because of this. And > >> often they didn't set these illegal combinations manually on their > >> own, rather they were caused by such scripts trying to be helpful by > >> fiddling with the LC_* variables but unfortunately didn't get it > >> right. > > I wonder what sort of problems would arise and why. > Anything can happen. The POSIX standard (``The Open Group Base > Specifications Issue 6'') says about this: > If different character sets are used by the locale categories, the > results achieved by an application utilizing these categories are > undefined. > (see http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap07.html) > And weird things really *do* happen. For example, I remember a > bug-report against sort because of this. > https://bugzilla.novell.com/show_bug.cgi?id=41506 Interesting bug that demonstrate the poor design of the locale mechanism which should rather work transparently whenever possible, without every tool having to adapt its code to special cases. > > Isn't the LC_CTYPE locale category (set by either LC_ALL, LC_CTYPE, or LANG) > > the only one which makes sense to have an encoding suffix according to > > the locale documentation? > I won't argue "makes sense". I agree with you here, I also have the > feeling that the system is slightly weird here. But we cannot help it, > it is a standard and we have to obey. > Although LC_CTYPE is the variable responsible for the encoding, you > *must* use the same encoding in all other LC_* variables you set. This will have to be accepted but it leaves room for some extension... > > Actually, however, the locale concept does not provide sufficient > > encoding configuration features: > > * It cannot distinguish between terminal encoding and preferred data > > encoding. For mined, I decided to slightly interpret (or misuse, > > if you want) > misuse, certainly misuse! This is against the POSIX standard. Not quite, see below. > > the locale mechanism by allowing the following: > > LC_CTYPE=something.UTF-8 > > LANG=something.gb18030 > > This would tell mined that the preferred encoding when editing text > > is GB18030 while leaving the LC_CTYPE category indicating a UTF-8 > > terminal, so other applications are not confused, and CJK files > > can easily be worked on in a UTF-8 terminal (there are options for > > this in mined, too, and of course auto-detection...). > Having options for this in mined is the only possibility here. You > cannot use the locale environment variables against the POSIX > standard. I'm actually not doing that. The POSIX requirement you quoted says that different character sets must not be used by the locale categories. But by rules of priority of the variables affecting each category, LANG doesn't affect anything if all other categories are set. So, when the example is modified slightly: LC_ALL=something.UTF-8 LANG=something.gb18030 this does not violate the POSIX locale standard. (For the POSIX locale mechanism, the value of LANG would have no effect here.) Maybe a little picky here, but this way at least one important missing configuration feature (distinguishing terminal encoding from data encoding) can be achieved this way if an application likes. > > * Even worse (of the locale mechanism), it cannot specify an encoding > > independently of the language settings. > > It would actually be sufficient to set LC_CTYPE=".UTF-8" in order > > to indicate terminal encoding, and e.g. set LANG=whatever for the > > preferred language setting. > I agree that this would be more logical. Several different encodings > in LC_* are forbidden anyway, why do we have to repeat the same > encoding in all LC_* variables? It would be logical if all > other LC_* variables use the encoding of LC_CTYPE automatically. > ... > > Then with n languages and m encodings supported on a system, it > > would not be necessary to install n*m sets of locale data (in > > theory, practically many are obviously left out) but only n+m. > Theoretically one could use other encodings for German as well, > i.e. de_DE.GB18030 would be theoretically possible. But such locales > are never installed. Sure, I said "many are obviously left out". > And most encodings wouldn't work for German > anyway, i.e. de_DE.SJIS could not work because SJIS has no umlauts. Actually, Shift-JIS X0213 does maintain umlauts. Try it with mined, enter them in the terminal, ESC u will reveal the encoding. Maybe a later extension of Shift-JIS; I took the table from libiconv. > So it is much less than n*m. Sure, but locale installation is far too complicated for users (I don't know myself how it works because there is no decent documentation) - and it seems it's not even possible if you're not root - so why should there be any arbitrary restriction at all that only raise trouble? > >> Maybe I will add that font to the SuSE xterm package. [meaning the 20x20 fonts] Yes, please do that. > >> > I hope some people will find it useful to have this script available > >> > in /usr/bin. > >> > >> It might be useful on legacy systems, but probably I should just omit > >> it in mined .rpm-packages for SuSE Linux >= 9.1 because it is not > >> really helpful there. > > Then at least you should also filter out uxterm from the xterm > > package. > I'm already thinking about that ... > > But what about the automatic font setup feature? I have not seen the > > SuSE 10 default font configuration yet, but isn't it useful to have > > a script that tries to achieve the best Unicode benefit? > I think xterm should already be setup by default to use the best > Unicode fonts available. > We already changed our xterm font setup in that direction. > Unfortunately we could not go all the way because some backwards > oriented users insisted on keeping the old default fonts. They would > not accept the slightest change in font style even if it is a huge > improvement towards better Unicode support by default. There we have a good reason to provide a script that overcomes the restrictions of traditional default fonts (probably the tiny ones) that have not been extended to cover Unicode. > > It's only an option, anyway, everyone may call xterm directly if > > that works sufficiently. > I'm just afraid it will generate extra bugs. I'm doing the best to avoid this, see below. > >> The updated version is attached. Thank you very much for the review. > mfabian@magellan:/tmp$ locale > LANG=ja_JP.UTF-8 > LC_CTYPE="ja_JP.UTF-8" > ... > LC_PAPER="ja_JP.UTF-8" > LC_ALL= > mfabian@magellan:/tmp$ LANG=en_US.ISO-8859-1 LC_PAPER=de_DE@euro ./uterm > 20x20 font not found, using 9x18 with 18x18. > Now in the xterm which started: > mfabian@magellan:/tmp$ locale > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > ... > LC_PAPER=de_DE@euro > LC_ALL= > mfabian@magellan:/tmp$ > Illegal combination of UTF-8 and ISO-8859-1 encoding because LC_PAPER > is still set to de_DE@euro -> trouble ahead. Well, in this case you explicitly caused the "illegal combination" yourself with the above quoted command line settings of LANG and LC_PAPER in a UTF-8 locale environment, so any trouble caused would not be caused by uterm. I think the uterm script has the following properties: * It gives the user the opportunity to start an xterm which can display a maximum of Unicode "out-of-the-box", in two respects: * Enforcing UTF-8 even if the user's environment is misconfigured. (not needed for SuSE) * Choosing a most suitable font. (also useful for SuSE default configuration, or with user's own configuration active that might address traditional non-sufficient fonts) * It avoids locale mismatch trouble (being debugged thanks to your review). * It serves as an auxiliary script for umined (which shall be installed in /usr/bin) or other scripts (:) that invoke an application in a separate terminal window. Kind regards, Thomas Wolff  | 
| 
     
      
      
      From: Mike F. <mf...@su...> - 2005-09-30 13:03:21
      
     
   | 
Thomas Wolff <mi...@to...> さんは書きました:
>> The updated version is attached. Thank you very much for the review.
> Should have been attached, sorry. Attaching this time.
    mfabian@magellan:/tmp$ locale
    LANG=ja_JP.UTF-8
    LC_CTYPE="ja_JP.UTF-8"
    LC_NUMERIC="ja_JP.UTF-8"
    LC_TIME="ja_JP.UTF-8"
    LC_COLLATE="ja_JP.UTF-8"
    LC_MONETARY="ja_JP.UTF-8"
    LC_MESSAGES="ja_JP.UTF-8"
    LC_PAPER="ja_JP.UTF-8"
    LC_NAME="ja_JP.UTF-8"
    LC_ADDRESS="ja_JP.UTF-8"
    LC_TELEPHONE="ja_JP.UTF-8"
    LC_MEASUREMENT="ja_JP.UTF-8"
    LC_IDENTIFICATION="ja_JP.UTF-8"
    LC_ALL=
    mfabian@magellan:/tmp$ 
    mfabian@magellan:/tmp$ LANG=en_US.ISO-8859-1 LC_PAPER=de_DE@euro ./uterm
    20x20 font not found, using 9x18 with 18x18.
Now in the xterm which started:
    mfabian@magellan:/tmp$ locale
    LANG=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER=de_DE@euro
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
    mfabian@magellan:/tmp$ 
Illegal combination of UTF-8 and ISO-8859-1 encoding because LC_PAPER
is still set to de_DE@euro -> trouble ahead.
-- 
Mike FABIAN   <mf...@su...>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
 | 
| 
     
      
      
      From: Mike F. <mf...@su...> - 2005-09-30 12:49:11
      
     
   | 
Thomas Wolff <mi...@to...> さんは書きました:
>> Weird problems can also occur when the settings of LANG, and the LC_*
>> variables are inconsistent. For example something like
>> 
>>      export LANG=de_DE@euro
>>      export LC_CTYPE=de_DE.UTF-8
>> 
>> (and all other LC_* variables unset) is not allowed.  I often had
>> reports from users who did run into problems because of this.  And
>> often they didn't set these illegal combinations manually on their
>> own, rather they were caused by such scripts trying to be helpful by
>> fiddling with the LC_* variables but unfortunately didn't get it
>> right.
> I wonder what sort of problems would arise and why.
Anything can happen. The POSIX standard (``The Open Group Base
Specifications Issue 6'') says about this:
    If different character sets are used by the locale categories, the
    results achieved by an application utilizing these categories are
    undefined.
    (see http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap07.html) 
And weird things really *do* happen. For example, I remember a
bug-report against sort because of this.
     https://bugzilla.novell.com/show_bug.cgi?id=41506
This is is an old SuSE 8.1 bug and therefore not public, only bugs for
SuSE >= 10.0 are public. But I added you to the CC: i.e. you can
access it now.
> Isn't the LC_CTYPE locale category (set by either LC_ALL, LC_CTYPE, or LANG) 
> the only one which makes sense to have an encoding suffix according to 
> the locale documentation?
I won't argue "makes sense". I agree with you here, I also have the
feeling that the system is slightly weird here. But we cannot help it,
it is a standard and we have to obey.
Although LC_CTYPE is the variable responsible for the encoding, you
*must* use the same encoding in all other LC_* variables you set.
It maybe weird but that's the way it is.
> So if I say LC_MESSAGES=de, I would expect 
> to get German messages which are automatically encoded according to the 
> terminal encoding indicated by the LC_CTYPE category.
No, it doesn't work like this.
For example, if you have
    LC_CTYPE=en_US.UTF-8
but want German messages, you *have* to use
    LC_MESSAGES=de_DE.UTF-8
(By the way, LC_MESSAGES=de is even an illegal locale on Linux,
i.e. you will get an error message from glibc. As naming conventions
for locale names differ somewhat on different systems, "de"
*might* be a legal locale name on some systems, but most likely
it won't use UTF-8 encoding. Therefore the combination with
LC_CTYPE=en_US.UTF-8 is always illegal).
> Actually, however, the locale concept does not provide sufficient 
> encoding configuration features:
> * It cannot distinguish between terminal encoding and preferred data 
>   encoding. For mined, I decided to slightly interpret (or misuse, 
>   if you want)
misuse, certainly misuse! This is against the POSIX standard.
>   the locale mechanism by allowing the following:
>   	LC_CTYPE=something.UTF-8
>   	LANG=something.gb18030
>   This would tell mined that the preferred encoding when editing text 
>   is GB18030 while leaving the LC_CTYPE category indicating a UTF-8 
>   terminal, so other applications are not confused, and CJK files 
>   can easily be worked on in a UTF-8 terminal (there are options for 
>   this in mined, too, and of course auto-detection...).
Having options for this in mined is the only possibility here.  You
cannot use the locale environment variables against the POSIX
standard.
> * Even worse (of the locale mechanism), it cannot specify an encoding 
>   independently of the language settings.
>   It would actually be sufficient to set LC_CTYPE=".UTF-8" in order 
>   to indicate terminal encoding, and e.g. set LANG=whatever for the 
>   preferred language setting.
I agree that this would be more logical. Several different encodings
in LC_* are forbidden anyway, why do we have to repeat the same
encoding in all LC_* variables? It would be logical if all
other LC_* variables use the encoding of LC_CTYPE automatically.
But we can't help this, POSIX did not design it like that,
we have to accept this. 
>   Then with n languages and m encodings supported on a system, it 
>   would not be necessary to install n*m sets of locale data (in 
>   theory, practically many are obviously left out) but only n+m.
It is not n*m because you cannot combine all encodings with
all languages. For example for German there is only
    de_DE.ISO-8859-15@euro
    de_DE.ISO-8859-1
    de_DE.UTF-8
Theoretically one could use other encodings for German as well,
i.e. de_DE.GB18030 would be theoretically possible. But such locales
are never installed. And most encodings wouldn't work for German
anyway, i.e. de_DE.SJIS could not work because SJIS has no umlauts.
So it is much less than n*m.
>   I have e.g. seen systems with only one UTF-8 locale installed 
>   (en_US.UTF-8). If the system does have a de_DE locale too, why 
>   should you not be allowed to set LC_ALL=de_DE.UTF-8? Why does 
>   software need to choke on such an obvious combination of available 
>   configuration information? I think this is absolutely silly.
Yes, but nevertheless we cannot change this.
>   Another problem in a heterogenous network is that some system may 
>   support de.UTF-8 but not de_DE, some other system may call it 
>   de_DE.utf8. It is virtually impossible for the innocent users 
>   to get this all right. The locale mechanism needs to add more 
>   tolerance to become user-friendly.
glibc "normalizes" the encoding part of the locales by ignoring case
and ignoring and '-' at the border between digits and letters.
I.e. as far as glibc is concerned,
   de_DE.UTF-8
   de_DE.UTF8
   de_DE.utf-8
   de_DE.utf8
   de_DE.uTf8
are all the same. Until very recently, X11 didn't accept all of these,
i.e. the only really correct version was de_DE.UTF-8 because all
others were not supported by X11. But Egbert Eich fixed this recently
in Xorg, i.e. since about a year all these variant spellings are
supported by X11 as well.
> For example, unset any LC_ variable, set LANG=x and call
> 	xterm -u8 -class UXTerm
> Then UTF-8 mode was NOT selected, and luit was invoked (hanging xterm...).
I see. By the way, I believe I have fixed the hanging problem:
    http://bugzilla.novell.com/show_bug.cgi?id=117193
>> Maybe I will add that font to the SuSE xterm package. But that is
>> only a temporary workaround, this should really be fixed in xterm,
>> see 
>> 
>>     http://bugzilla.novell.com/show_bug.cgi?id=49305
> Yes, I have been nagging Thomas Dickey about this for years, he also 
> mentions me anonymously in this bug report :)
> It is an xterm workaround but maybe one for other programs, too.
I know of no other program which needs this.
> Shouldn't the font rather go into the fonts package?
I don't think so, because this workaround is (only!) for xterm.  So
one must make sure that the font is always installed when xterm is
installed.  The only other candidate apart from the xterm.rpm would be
the xorg-x11.rpm because xterm requires this and it already contains
the 10x20 and 18x18 fonts:
mfabian@magellan:~$ rpm -qf  /usr/X11R6/lib/X11/fonts/misc/10x20.pcf.gz 
xorg-x11-6.8.2-100
mfabian@magellan:~$ rpm -qf  /usr/X11R6/lib/X11/fonts/misc/18x18ja.pcf.gz 
xorg-x11-6.8.2-100
mfabian@magellan:~$
But it is always a lot more hassle to do any changes to the huge
xorg-x11 package. Changing the tiny xterm package is much easier and
as the workaround is only for xterm I think this is OK.
>> > I hope some people will find it useful to have this script available 
>> > in /usr/bin.
>> 
>> It might be useful on legacy systems, but probably I should just omit
>> it in mined .rpm-packages for SuSE Linux >= 9.1 because it is not
>> really helpful there.
> Then at least you should also filter out uxterm from the xterm
> package.
I'm already thinking about that ...
> But what about the automatic font setup feature? I have not seen the 
> SuSE 10 default font configuration yet, but isn't it useful to have 
> a script that tries to achieve the best Unicode benefit?
I think xterm should already be setup by default to use the best
Unicode fonts available.
We already changed our xterm font setup in that direction.
Unfortunately we could not go all the way because some backwards
oriented users insisted on keeping the old default fonts.  They would
not accept the slightest change in font style even if it is a huge
improvement towards better Unicode support by default.
> It's only an option, anyway, everyone may call xterm directly if
> that works sufficiently.
I'm just afraid it will generate extra bugs.
>> I tried it on SuSE Linux 10.0:
>> ...
>> I.e. the part
>> ...
>> is much to simple minded and already gets it wrong with the default
>> locale setting of a SuSE Linux >= 9.1 installation. The default on
>> SuSE Linux >= 9.1 is LANG=xx_YY.UTF-8 and all LC_* unset, xx_YY
>> depending on the language/territory selected during the installation.
>>From the history of locale mechanisms, I had the impression that 
> LANG was a kind of legacy configuration variable supported for 
> compatibility and LC_* is rather supposed to be used; but maybe I'm wrong.
No, LANG is not legacy in any way.  If you are happy with one locale,
it is just fine to set only LANG and nothing else.
You need to set more only if you need to mix some settings, i.e.
if you want to use US-English always except for the paper
format where you prefer 'A4", then it is just fine to set only
    LANG=en_US.UTF-8
    LC_PAPER=en_GB.UTF-8
There is no requirement to set any of the LC_* variables,
those which are not set just inherit their effective value
from LANG.
> I have corrected handling of all these three variables, and I think 
> it's a working portable script now. I have also improved inline 
> documentation of the script. I will also produce a manual page.
>
> The updated version is attached. Thank you very much for the review.
I'll check it again.
> [Further discussion should perhaps be off-list as this is not so 
> closely related to mined.]
not directly, but nevertheless maybe interesting to mined users?
-- 
Mike FABIAN   <mf...@su...>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
 | 
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-09-29 16:57:40
      
     
   | 
> I have corrected handling of all these three variables, and I think > it's a working portable script now. I have also improved inline > documentation of the script. I will also produce a manual page. > > The updated version is attached. Thank you very much for the review. Should have been attached, sorry. Attaching this time. Thomas Wolff  | 
| 
     
      
      
      From: Mike F. <mf...@su...> - 2005-09-29 16:52:33
      
     
   | 
Thomas Wolff <mi...@to...> さんは書きました:
>> I.e. on recent SuSE or RedHat/Fedora systems, nothing needs to be
>> configured, calling just "xterm" is enough.
> That is good. What about font configuration?
The app-defaults for xterm on SuSE Linux use iso10646-1 always.
In XTerm*locale is set to "checkfont". Then there are
the following 3 cases:
    1) the locale is an UTF-8 locale:
       no conversion needed, the fonts are used as is
    2) the locale encoding is ISO-8859-1 or ISO-8859-15:
       the "mini-luit" converter built into xterm is used
       to display the characters correctly with the iso10646-1
       fonts.
    3) all other locales: "luit" is used to display the characters
       correctly using the iso10646-1 font.
I.e. one doesn't have to setup anything in most cases.  Only if the
default fonts don't have the characters one needs, one has to setup
other fonts.
-- 
Mike FABIAN   <mf...@su...>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
 | 
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-09-29 15:59:43
      
     
   | 
> >> For a long time already, xterm works just fine in UTF-8 without > >> setting any special options. > > Not quite, as still today many users are not well-advised how to > > easily configure UTF-8. > > On SuSE Linux, UTF-8 is the default since SuSE Linux 9.1, i.e. more > than a year now. RedHat switched to UTF-8 as the default about two > years earlier. > > I.e. on recent SuSE or RedHat/Fedora systems, nothing needs to be > configured, calling just "xterm" is enough. That is good. What about font configuration? But anyway, I am providing a package for not only modern Linux systems, but also older systems, SunOS, ... > >> Same with LESSCHARSET. LESSCHARSET shouldn't be set either, less also > >> detects this automatically from the environment. > > See above; for the benefit of supporting older systems, I think it's > > a good idea to maintain such compatibility settings for a couple of years > > still, especially as they do no harm. > > In my experience they do harm. I often had reports from users who had > LESSCHARSET, LANG, or some LC_* variables changed by some overly > helpful script and ran into problems because of this. > > For example, some script sets LESSCHARSET=utf-8. ... > ... > LANG=de_DE@euro xterm > > But somehow it doesn't work because LESSCHARSET=utf-8 is still in the > environment. The user is very confused because he didn't think about > LESSCHARSET at all and reports a bug. This is of course true. I agree on this variable and have removed it. (Any user needing this trick should do it in a shell profile.) > Weird problems can also occur when the settings of LANG, and the LC_* > variables are inconsistent. For example something like > > export LANG=de_DE@euro > export LC_CTYPE=de_DE.UTF-8 > > (and all other LC_* variables unset) is not allowed. I often had > reports from users who did run into problems because of this. And > often they didn't set these illegal combinations manually on their > own, rather they were caused by such scripts trying to be helpful by > fiddling with the LC_* variables but unfortunately didn't get it > right. I wonder what sort of problems would arise and why. Isn't the LC_CTYPE locale category (set by either LC_ALL, LC_CTYPE, or LANG) the only one which makes sense to have an encoding suffix according to the locale documentation? So if I say LC_MESSAGES=de, I would expect to get German messages which are automatically encoded according to the terminal encoding indicated by the LC_CTYPE category. Actually, however, the locale concept does not provide sufficient encoding configuration features: * It cannot distinguish between terminal encoding and preferred data encoding. For mined, I decided to slightly interpret (or misuse, if you want) the locale mechanism by allowing the following: LC_CTYPE=something.UTF-8 LANG=something.gb18030 This would tell mined that the preferred encoding when editing text is GB18030 while leaving the LC_CTYPE category indicating a UTF-8 terminal, so other applications are not confused, and CJK files can easily be worked on in a UTF-8 terminal (there are options for this in mined, too, and of course auto-detection...). * Even worse (of the locale mechanism), it cannot specify an encoding independently of the language settings. It would actually be sufficient to set LC_CTYPE=".UTF-8" in order to indicate terminal encoding, and e.g. set LANG=whatever for the preferred language setting. Then with n languages and m encodings supported on a system, it would not be necessary to install n*m sets of locale data (in theory, practically many are obviously left out) but only n+m. Fortunately many programs do look at LC_CTYPE and recognise the suffix properly to configure themselves for UTF-8 but some don't. Software that is picky about locales often fails to adapt to UTF-8 because the user is unlucky not to have the exact needed locales installed on some of the used systems. I have e.g. seen systems with only one UTF-8 locale installed (en_US.UTF-8). If the system does have a de_DE locale too, why should you not be allowed to set LC_ALL=de_DE.UTF-8? Why does software need to choke on such an obvious combination of available configuration information? I think this is absolutely silly. Another problem in a heterogenous network is that some system may support de.UTF-8 but not de_DE, some other system may call it de_DE.utf8. It is virtually impossible for the innocent users to get this all right. The locale mechanism needs to add more tolerance to become user-friendly. > > Also, in auto-detection mode, xterm obviously involves "luit" for > > internal mediation of character encoding, and this even if it > > then runs in UTF-8 mode! > > No, xterm does never call luit in UTF-8 locales. See the following > extract from the xterm man-page: > ... > ... > > I found that it is necessary to add the X resource > > UXTerm*locale:false to the xterm invocation to avoid this problem. > > No, also with "true", "medium", and "checkfont", xterm will not run > luit when in UTF-8 mode. It may run luit when *not* in UTF-8 mode, > depending on the setting of the "XTerm*locale" X resource. My description wasn't 100% correct but just for the record that I wasn't inventing plain nonsense, let me describe the details: The problem I mentioned (involving luit) happened with xterm 179, it does not occur with xterm 200 anymore. The problem was that without that resource (UXTerm*locale:false), environment variables used to override the -u8 option (which is not really the purpose of that option), even if their values were invalid. For example, unset any LC_ variable, set LANG=x and call xterm -u8 -class UXTerm Then UTF-8 mode was NOT selected, and luit was invoked (hanging xterm...). So to be on the safe side to enforce UTF-8 mode, and considering that the locale mechanism is not reliable (as argued above), the option -u8 is still useful for a portable script, supplemented by additional configuration as needed to make it effective in all cases. > > Considering the arguments so far, I think it is clearly useful to have > > a "uterm" script just like the "uxterm" script that was later introduced > > into the xterm distribution. Having the latter, "uterm" didn't have > > additional value, however. > > uxterm as well caused more problems then it solved in my opinion. It's a little cryptic but it seems to do the right thing. > > * The best Unicode terminal font in my opinion is the 10x20 font, > > which is much more legible than the spindly 9x18 font, and the > > smaller fonts are not suitable at all for a number of scripts. > > Unfortunately, the Unicode X fonts distribution does not include > > matching 20x20 CJK fonts, and xterm cannot handle single-width > > and double-with fonts that do not exactly match in size (like rxvt > > can do!). > > mlterm can pad fonts which don't match exactly as well. > > > For that reason, I am providing a script that creates 20x20 CJK > > fonts from the 18x18 X fonts by padding all the glyphs. > > The uterm script checks if that font is installed and in this case > > invokes xterm with it. > > Maybe I will add that font to the SuSE xterm package. But that is > only a temporary workaround, this should really be fixed in xterm, > see > > http://bugzilla.novell.com/show_bug.cgi?id=49305 Yes, I have been nagging Thomas Dickey about this for years, he also mentions me anonymously in this bug report :) It is an xterm workaround but maybe one for other programs, too. Shouldn't the font rather go into the fonts package? > > I hope some people will find it useful to have this script available > > in /usr/bin. > > It might be useful on legacy systems, but probably I should just omit > it in mined .rpm-packages for SuSE Linux >= 9.1 because it is not > really helpful there. Then at least you should also filter out uxterm from the xterm package. But what about the automatic font setup feature? I have not seen the SuSE 10 default font configuration yet, but isn't it useful to have a script that tries to achieve the best Unicode benefit? It's only an option, anyway, everyone may call xterm directly if that works sufficiently. > > I have attached this new "uterm" script to this mail for your kind > > evaluation. > > I tried it on SuSE Linux 10.0: > ... > I.e. the part > ... > is much to simple minded and already gets it wrong with the default > locale setting of a SuSE Linux >= 9.1 installation. The default on > SuSE Linux >= 9.1 is LANG=xx_YY.UTF-8 and all LC_* unset, xx_YY > depending on the language/territory selected during the installation. From the history of locale mechanisms, I had the impression that LANG was a kind of legacy configuration variable supported for compatibility and LC_* is rather supposed to be used; but maybe I'm wrong. > Of course you can try to improve that, but it is quite hard to get > this part to work right in all circumstances on all systems. I have corrected handling of all these three variables, and I think it's a working portable script now. I have also improved inline documentation of the script. I will also produce a manual page. The updated version is attached. Thank you very much for the review. [Further discussion should perhaps be off-list as this is not so closely related to mined.] Thomas Wolff  | 
| 
     
      
      
      From: Mike F. <mf...@su...> - 2005-09-28 10:19:15
      
     
   | 
Thomas Wolff <mi...@to...> さんは書きました:
> I have attached this new "uterm" script to this mail for your kind 
> evaluation.
I tried it on SuSE Linux 10.0:
    mfabian@magellan:~$ locale
    LANG=ja_JP.UTF-8
    LC_CTYPE="ja_JP.UTF-8"
    LC_NUMERIC="ja_JP.UTF-8"
    LC_TIME="ja_JP.UTF-8"
    LC_COLLATE="ja_JP.UTF-8"
    LC_MONETARY="ja_JP.UTF-8"
    LC_MESSAGES="ja_JP.UTF-8"
    LC_PAPER="ja_JP.UTF-8"
    LC_NAME="ja_JP.UTF-8"
    LC_ADDRESS="ja_JP.UTF-8"
    LC_TELEPHONE="ja_JP.UTF-8"
    LC_MEASUREMENT="ja_JP.UTF-8"
    LC_IDENTIFICATION="ja_JP.UTF-8"
    LC_ALL=
    mfabian@magellan:~$ /tmp/uterm 
    20x20 font not found, using 9x18 with 18x18.
    Warning: locale not supported by C library, locale unchanged
And it the xterm which was started:
    Warning: couldn't set locale.
    mfabian@magellan:~$ locale
    locale: Cannot set LC_CTYPE to default locale: No such file or directory
    locale: LC_ALL?????????????????????: ??????????????????????
    LANG=ja_JP.UTF-8
    LC_CTYPE=.UTF-8
    LC_NUMERIC="ja_JP.UTF-8"
    LC_TIME="ja_JP.UTF-8"
    LC_COLLATE="ja_JP.UTF-8"
    LC_MONETARY="ja_JP.UTF-8"
    LC_MESSAGES="ja_JP.UTF-8"
    LC_PAPER="ja_JP.UTF-8"
    LC_NAME="ja_JP.UTF-8"
    LC_ADDRESS="ja_JP.UTF-8"
    LC_TELEPHONE="ja_JP.UTF-8"
    LC_MEASUREMENT="ja_JP.UTF-8"
    LC_IDENTIFICATION="ja_JP.UTF-8"
    LC_ALL=
    mfabian@magellan:~$ 
I.e. the part
    #############################################################################
    # Modify locale environment to indicate UTF-8 mode:
    case "$LC_ALL" in
    ?*)	LC_ALL=`echo $LC_ALL | sed -e "s,\..*,,"`.UTF-8
            export LC_ALL;;
    *)	LC_CTYPE=`echo $LC_CTYPE | sed -e "s,\..*,,"`.UTF-8
            export LC_CTYPE;;
    esac
is much to simple minded and already gets it wrong with the default
locale setting of a SuSE Linux >= 9.1 installation. The default on
SuSE Linux >= 9.1 is LANG=xx_YY.UTF-8 and all LC_* unset, xx_YY
depending on the language/territory selected during the installation.
Of course you can try to improve that, but it is quite hard to get
this part to work right in all circumstances on all systems.
-- 
Mike FABIAN   <mf...@su...>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
 | 
| 
     
      
      
      From: Mike F. <mf...@su...> - 2005-09-28 10:10:28
      
     
   | 
Thomas Wolff <mi...@to...> さんは書きました:
>> For a long time already, xterm works just fine in UTF-8 without
>> setting any special options.
> Not quite, as still today many users are not well-advised how to 
> easily configure UTF-8.
On SuSE Linux, UTF-8 is the default since SuSE Linux 9.1, i.e.  more
than a year now. RedHat switched to UTF-8 as the default about two
years earlier.
I.e. on recent SuSE or RedHat/Fedora systems, nothing needs to be
configured, calling just "xterm" is enough.
>> Same with LESSCHARSET. LESSCHARSET shouldn't be set either, less also
>> detects this automatically from the environment.
> See above; for the benefit of supporting older systems, I think it's 
> a good idea to maintain such compatibility settings for a couple of years 
> still, especially as they do no harm.
In my experience they do harm. I often had reports from users who had
LESSCHARSET, LANG, or some LC_* variables changed by some overly
helpful script and ran into problems because of this.
For example, some script sets LESSCHARSET=utf-8. The user doesn't
notice, but at first there is no problem, everything still works
because the default was UTF-8 anyway. Now the user wants to look
at one of his old ISO-8859-15 files and for that purpose starts
a new xterm or uses luit:
     LANG=de_DE@euro xterm
or
     LANG=de_DE@euro luit
But somehow it doesn't work because LESSCHARSET=utf-8 is still in the
environment. The user is very confused because he didn't think about
LESSCHARSET at all and reports a bug.
Weird problems can also occur when the settings of LANG, and the LC_*
variables are inconsistent. For example something like
     export LANG=de_DE@euro
     export LC_CTYPE=de_DE.UTF-8
(and all other LC_* variables unset) is not allowed.  I often had
reports from users who did run into problems because of this.  And
often they didn't set these illegal combinations manually on their
own, rather they were caused by such scripts trying to be helpful by
fiddling with the LC_* variables but unfortunately didn't get it
right.
On modern systems, it is more difficult to get a non-UTF-8
terminal. UTF-8 is the default anyway, only when the user does *not*
want to use UTF-8 temporarily, special setup is needed.
> Also, in auto-detection mode, xterm obviously involves "luit" for 
> internal mediation of character encoding, and this even if it 
> then runs in UTF-8 mode!
No, xterm does never call luit in UTF-8 locales. See the following
extract from the xterm man-page:
man xterm> locale (class Locale)
man xterm>         Specifies how to use luit, an encoding converter between  UTF-8
man xterm>         and  locale  encodings.  The resource value (ignoring case) may
man xterm>         be:
man xterm>
man xterm>         true
man xterm>             xterm  will  use  the  encoding  specified  by  the  users'
man xterm>             LC_CTYPE locale (i.e., LC_ALL, LC_CTYPE, or LANG variables)
man xterm>             as far as possible.  This is realized  by  always  enabling
man xterm>             UTF-8 mode and invoking luit in non-UTF-8 locales.
man xterm>
man xterm>         medium
man xterm>             xterm  will  follow  users' LC_CTYPE locale only for UTF-8,
man xterm>             east Asian, and Thai locales, where the encodings were  not
man xterm>             supported  by  conventional  8bit mode with changing fonts.
man xterm>             For other locales, xterm will use conventional 8bit mode.
man xterm>
man xterm>         checkfont
man xterm>             If mini-luit is compiled-in, xterm will check if a  Unicode
man xterm>             font has been specified.  If so, it checks if the character
man xterm>             encoding for  the  current  locale  is  POSIX,  Latin-1  or
man xterm>             Latin-9, uses the appropriate mapping to support those with
man xterm>             the Unicode font.  For other encodings, xterm assumes  that
man xterm>             UTF-8 encoding is required.
man xterm>
man xterm>         false
man xterm>             xterm will use conventional 8bit mode or UTF-8 mode accord‐
man xterm>             ing to utf8 resource or -u8 option.
man xterm>
man xterm>         Any other value, e.g., ``UTF-8'' or ``ISO8859-2'',  is  assumed
man xterm>         to  be  an  encoding  name; luit will be invoked to support the
man xterm>         encoding.  The actual list of supported  encodings  depends  on
man xterm>         luit.  The default is ``medium''.
man xterm>
man xterm>         Regardless of your locale and encoding, you need an ISO-10646-1
man xterm>         font to display the result.  Your configuration may not include
man xterm>         this  font,  or  locale-support by xterm may not be needed.  At
man xterm>         startup, xterm uses a  mechanism  equivalent  to  the  load-vt-
man xterm>         fonts(utf8Fonts, Utf8Fonts)  action  to  load  font name subre‐
man xterm>         sources of the VT100 widget.  That is, resource  patterns  such
man xterm>         as   "*vt100.utf8Fonts.font"  will  be  loaded,  and  (if  this
man xterm>         resource is enabled), override the normal fonts.  If no  subre‐
man xterm>         sources  are  found,  the  normal  fonts such as "*vt100.font",
man xterm>         etc., are used.  The resource files distributed with xterm  use
man xterm>         ISO-10646-1 fonts, but do not rely on them unless you are using
man xterm>         the locale mechanism.
> This can unfortunately hang xterm (if combined with option -e) as
> one mined user reported.
Yes indeed, luit sometimes hang, see
     http://bugzilla.novell.com/show_bug.cgi?id=117193
> I found that it is necessary to add the X resource
> UXTerm*locale:false to the xterm invocation to avoid this problem.
No, also with "true", "medium", and "checkfont", xterm will not run
luit when in UTF-8 mode. It may run luit when *not* in UTF-8 mode,
depending on the setting of the "XTerm*locale" X resource.
> Considering the arguments so far, I think it is clearly useful to have 
> a "uterm" script just like the "uxterm" script that was later introduced 
> into the xterm distribution. Having the latter, "uterm" didn't have 
> additional value, however.
uxterm as well caused more problems then it solved in my opinion.
> * The best Unicode terminal font in my opinion is the 10x20 font, 
>   which is much more legible than the spindly 9x18 font, and the 
>   smaller fonts are not suitable at all for a number of scripts.
>   Unfortunately, the Unicode X fonts distribution does not include 
>   matching 20x20 CJK fonts, and xterm cannot handle single-width 
>   and double-with fonts that do not exactly match in size (like rxvt 
>   can do!).
mlterm can pad fonts which don't match exactly as well.
>   For that reason, I am providing a script that creates 20x20 CJK 
>   fonts from the 18x18 X fonts by padding all the glyphs.
>   The uterm script checks if that font is installed and in this case 
>   invokes xterm with it.
Maybe I will add that font to the SuSE xterm package. But that is
only a temporary workaround, this should really be fixed in xterm,
see 
    http://bugzilla.novell.com/show_bug.cgi?id=49305
> I hope some people will find it useful to have this script available 
> in /usr/bin.
It might be useful on legacy systems, but probably I should just omit
it in mined .rpm-packages for SuSE Linux >= 9.1 because it is not
really helpful there.
-- 
Mike FABIAN   <mf...@su...>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
 | 
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-09-28 08:59:34
      
     
   | 
I asked: >> Do you think an improved uterm script should be installed in a >> common place too, provided it offers features not covered by the >> uxterm script (which comes with xterm)? Werner Lemberg wrote: > I have no opinion. Provided there exists a proper man page, I don't > object. Mike Fabian wrote: > I don't think such a script is needed. Actually, I have just updated the uterm script to really provide additional value, and I plan to get it installed to /usr/bin for the next release. See below for the reasons, first I'd like to discuss the "automatic" configuration issue. > For a long time already, xterm works just fine in UTF-8 without > setting any special options. Not quite, as still today many users are not well-advised how to easily configure UTF-8. > The option "-u8" is not needed when running in an UTF-8 locale, xterm > than uses UTF-8 mode automatically. The configuration trouble for the users is made worse by the fact that this "automatic" UTF-8 mode in newer xterm versions relies on the locale mechanism (like many other programs do). While this common mechanism is of course a good idea as a strategy, it does not yet work in practice; there are no common values for the locale variables that would work equally on all systems, so it's really a hassle to get a proper .profile or .login script that works in a heterogeneous network. Also users may well be stuck on a system without proper locales configured, and may be faced with an administrator who does not care. > This option was only a workaround > for old systems which don't have decent UTF-8 support. You might be surprised how many old and really old systems are still in daily use in company labs and academic networks. I also think it's one of the benefits of mined to provide seemless UTF-8 support in legacy environments too, and also support other encodings and input methods "out-of-the-box". For the same reason, I think it's an advantage to have a script available that starts a UTF-8 terminal regardless of proper environment settings. > Same with LESSCHARSET. LESSCHARSET shouldn't be set either, less also > detects this automatically from the environment. See above; for the benefit of supporting older systems, I think it's a good idea to maintain such compatibility settings for a couple of years still, especially as they do no harm. > Usually it is enough nowadays just to start > > LANG=en_GB.UTF-8 xterm > > to get an xterm in UTF-8 and > > LANG=de_DE@euro xterm > > to get an xterm in ISO-8859-15 (this is only an example here where I > assume that none of the LC_* variables was set). "Usually", yes. But far from "always". And even if it were - why bother the users with having to set LANG to some specific value (which may even vary on the users' systems!) if it's much easier to have a unique script for the purpose? About the locale mechanism and its problems, see also the last chapter in my paper and presentation about mined at the IUC 27 conference, available for download on the overview page of http://towo.net/mined/. > Setting LESSCHARSET and using options like "-u8" often defeats the > auto-detection and this can be very confusing. Rather the auto-detection itself may be confusing. If you definitely want UTF-8 and can achieve that with a known option, why rely on auto-detection that depends on fragile proper environment settings? Also, in auto-detection mode, xterm obviously involves "luit" for internal mediation of character encoding, and this even if it then runs in UTF-8 mode! This can unfortunately hang xterm (if combined with option -e) as one mined user reported. I found that it is necessary to add the X resource UXTerm*locale:false to the xterm invocation to avoid this problem. Considering the arguments so far, I think it is clearly useful to have a "uterm" script just like the "uxterm" script that was later introduced into the xterm distribution. Having the latter, "uterm" didn't have additional value, however. Now one big problem remains: font setup. Users often have the trouble that even if they can run a UTF-8 terminal, they do not have proper fonts configured to see the scripts they want. They do not know which fonts would contain the desired glyphs and how to use such fonts. That's where the new "uterm" script comes in handy; it tries to run a terminal with a "best-buy font" approach: * The best Unicode terminal font in my opinion is the 10x20 font, which is much more legible than the spindly 9x18 font, and the smaller fonts are not suitable at all for a number of scripts. Unfortunately, the Unicode X fonts distribution does not include matching 20x20 CJK fonts, and xterm cannot handle single-width and double-with fonts that do not exactly match in size (like rxvt can do!). For that reason, I am providing a script that creates 20x20 CJK fonts from the 18x18 X fonts by padding all the glyphs. The uterm script checks if that font is installed and in this case invokes xterm with it. * If the first attempt fails, uterm checks if 9x18 and 18x18 fonts are available, and then start xterm with them. * The next smaller font with a reasonable Unicode character repertoire is the GNU unifont. Unfortunately, xterm is not capable of using it as it is designed to run with separate single-width and double-width fonts only. For that reason, uterm checks if the GNU unifont is installed and then runs rxvt with it (which can handle it). * The last explicit choice of uterm is the 6x13 font which has a matching 12x13 CJK font in the Unicode X fonts distribution. * If this all fails, uterm starts xterm just with its default configuration. I have attached this new "uterm" script to this mail for your kind evaluation. I appreciate any comments and further suggestions and I hope some people will find it useful to have this script available in /usr/bin. Thomas Wolff  | 
| 
     
      
      
      From: Mike F. <mf...@su...> - 2005-09-27 14:37:57
      
     
   | 
Werner LEMBERG <wl...@gn...> さんは書きました:
>> Do you think an improved uterm script should be installed in a
>> common place too, provided it offers features not covered by the
>> uxterm script (which comes with xterm)?
>
> I have no opinion.  Provided there exists a proper man page, I don't
> object.
I don't think such a script is needed.
For a long time already, xterm works just fine in UTF-8 without
setting any special options.
The option "-u8" is not needed when running in an UTF-8 locale, xterm
than uses UTF-8 mode automatically. This option was only a workaround
for old systems which don't have decent UTF-8 support.
Same with LESSCHARSET. LESSCHARSET shouldn't be set either, less also
detects this automatically from the environment.
Usually it is enough nowadays just to start
    LANG=en_GB.UTF-8 xterm
to get an xterm in UTF-8 and
    LANG=de_DE@euro xterm
to get an xterm in ISO-8859-15 (this is only an example here where I
assume that none of the LC_* variables was set).
Setting LESSCHARSET and using options like "-u8" often defeats the
auto-detection and this can be very confusing.
-- 
Mike FABIAN   <mf...@su...>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
 | 
| 
     
      
      
      From: Werner L. <wl...@gn...> - 2005-08-22 14:26:22
      
     
   | 
> Thanks for your hints. They are all about the package and the
> auxiliary files, not mined itself, so can I assume you are satisfied
> with the editor? :)
I haven't found time yet to further investigate, sorry.  It looks very
promising, though.
> >   It might be a good idea to introduce an INSTALL_DATA variable
> >   (as used, e.g., in autoconf), which is the same as `${INSTALL}
> >   -m 644'.
>
> The problem is that I use a generic install .../* command line
Hmm, in my packages I try to be as precise as possible, listing each
file to be installed explicitly -- at least on Unix platforms you
should be able to do that too because you don't have the ridiculous
length limitation of DOS and Windows shells.
> but install is too stupid to preserve the access rights as it would
> be reasonable.
???  You actually already use install's --mode argument in file
mkinclud.mak, so where's the problem?
> Maybe I should rather use cp as no one could really explain me why
> install would be of advantage here.
Well, some administrators don't allow direct access to binaries.
Instead, they use wrapper scripts, together with a modified `install'
script which automatically creates such wrappers.
> > . Doing `make localinstall', the files `{x,u,w,l}mined' should be
> >   installed in /usr/local/bin, not /usr/local/share/mined.  IMHO
> >   this is a serious issue since mined.1 advertises those scripts.
>
> [...] I think I'll change this as you and also Mike Fabian (the SuSE
> package maintainer) have suggested.
Excellent.
> Do you think an improved uterm script should be installed in a
> common place too, provided it offers features not covered by the
> uxterm script (which comes with xterm)?
I have no opinion.  Provided there exists a proper man page, I don't
object.
> What about the uprint script (which is also useful to print a file
> from the command line) and maybe even the installfonts helper
> script?
Since I'm quite sure that I'll never use this script I really don't
have any opinion too :-)
> Please add the option -xrm "UXTerm*locale: false" to the xterm
> invocation in the umined script.
Thanks.  This works.
    Werner
 | 
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-08-22 13:28:09
      
     
   | 
> [mined-2000.11]
> 
> A couple of bugs:
Thanks for your hints. They are all about the package and the 
auxiliary files, not mined itself, so can I assume you are satisfied 
with the editor? :)
> . Doing `make localinstall' on a GNU/Linux box sets the executable bit
>   on all files in /usr/local/share/mined, which is wrong.
Not nice, I know, though not really "wrong" as this is not harmful.
>   It might be a good idea to introduce an INSTALL_DATA variable (as
>   used, e.g., in autoconf), which is the same as `${INSTALL} -m 644'.
The problem is that I use a generic install .../* command line but 
install is too stupid to preserve the access rights as it would be 
reasonable. Maybe I should rather use cp as no one could really explain 
me why install would be of advantage here.
> . Doing `make localinstall', the files `{x,u,w,l}mined' should be
>   installed in /usr/local/bin, not /usr/local/share/mined.  IMHO this
>   is a serious issue since mined.1 advertises those scripts.
The reason is that I hesitated to install a larger number of auxiliary 
scripts in a common bin directory which other people might object to.
The manual also mentions where they can be found but I agree it's 
still not obvious. I think I'll change this as you and also Mike 
Fabian (the SuSE package maintainer) have suggested.
Do you think an improved uterm script should be installed in a common 
place too, provided it offers features not covered by the uxterm script 
(which comes with xterm)?
What about the uprint script (which is also useful to print a file 
from the command line) and maybe even the installfonts helper script?
> . I have xterm version 184, and it seems that the `umined' script
>   doesn't work with this version (this is, my xterm still can't use
>   `-class' and `-e' together): Doing
> 
>     xterm -u8 +sb -class UXTerm -e mined
> 
>   on the command line I just get a blank window which I can close with
>   Ctr-C.
I could reproduce this with xterm 179 and the problem seems to be 
luit which gets involved by xterm.
Calling luit -x -- mined
sometimes has the same effect, and sometimes it works.
Please add the option -xrm "UXTerm*locale: false"
to the xterm invocation in the umined script. In this case luit will 
not be involved and the problem does not occur. I'll change the script 
for the next release.
Kind regards,
Thomas Wolff
 | 
| 
     
      
      
      From: Werner L. <wl...@gn...> - 2005-08-19 13:36:27
      
     
   | 
[mined-2000.11]
A couple of bugs:
. Doing `make localinstall' on a GNU/Linux box sets the executable bit
  on all files in /usr/local/share/mined, which is wrong.
  It might be a good idea to introduce an INSTALL_DATA variable (as
  used, e.g., in autoconf), which is the same as `${INSTALL} -m 644'.
. Doing `make localinstall', the files `{x,u,w,l}mined' should be
  installed in /usr/local/bin, not /usr/local/share/mined.  IMHO this
  is a serious issue since mined.1 advertises those scripts.
. I have xterm version 184, and it seems that the `umined' script
  doesn't work with this version (this is, my xterm still can't use
  `-class' and `-e' together): Doing
    xterm -u8 +sb -class UXTerm -e mined
  on the command line I just get a blank window which I can close with
  Ctr-C.
  On the other hand, executing first
    xterm -u8 +sb -class UXTerm
  then
    mined
  works just fine.  Similarly,
    xterm -u8 +sb -e mined
  is fine too.
     Werner
 | 
| 
     
      
      
      From: Mike F. <mf...@su...> - 2005-08-06 19:01:11
      
     
   | 
-- Mike FABIAN <mf...@su...> http://www.suse.de/~mfabian $B?gL2ITB-$O$$$$;E;v$NE($@!#(B  | 
| 
     
      
      
      From: <mi...@to...> - 2005-08-03 01:03:26
      
     
   | 
                             ANNOUNCEMENT
                         mined 2000 release 11
                             (July 2005)
Mined is a powerful text editor with a comprehensive and easy-to-use 
user interface and fast, small-footprint behaviour.
Mined provides both extensive Unicode and CJK support offering many 
specific features and covering special cases that other editors 
are not aware of (like auto-detection features and automatic handling 
of terminal variations, or Han character information).
It was the first editor that supported Unicode in a plain-text terminal.
Basically, mined is an editor tailored to reliable and efficient 
editing of plain text documents and programs, with features and 
interactive behaviour designed for this purpose.
------------------------------------------------------------------------
More information (with screenshots, feature overview and change log) 
and download are available from the mined web site at
	http://towo.net/mined/
Mined is co-hosted at sourceforge and has a mailing list 
which can be subscribed at
<https://lists.sourceforge.net/lists/listinfo/mined-editor>
------------------------------------------------------------------------
Major enhancements in this release:
Unicode support enhancements:
* Updated to Unicode 4.1.0:
  * Case conversion, Script information.
  * Combining character width properties.
  * Han information (from Unihan database) for CJK characters.
  * Radical/Stroke input method (to include new CJK characters).
  * Added Hanyu Pinlu and Tang pronunciation information 
    (from Unihan database) to Han information options.
  * Added generic and supplemental character input mnemonics 
    for new LATIN characters.
* Indication and character information of Unicode combining characters 
  now refers to the most recent Unicode version, not the actual 
  terminal capabilities.
Interactive enhancements:
* Conciliated keypad assignment preference conflict between Cut/Paste 
  functions (as propagated by mined) and character deletion / line 
  positioning functions (as often commonly expected):
  * The more common Home/End/Delete function assignments to the 
    respective keypad keys are also easily accessible (e.g. Alt-Del).
  * Documentation for alternative assignment option improved.
  * Using Del without a paste buffer gives an additional hint on 
    alternative usage.
* Pull-down menus are now scrollable so they are always displayed 
  (also the large menus in small terminal windows).
* Additional assignment of "Delete single" function (to delete without 
  auto-undent, or to delete the last combining accent only) to F5 
  Backarrow.
* Additional commands (HOP) F1 F1 / Shift-F1 / Control-F1 / Alt-F1 to 
  display a help status line of (shifted) function key assignments.
* Slight revision of function key assignments to improve intuitive 
  usage and compliance with common usage.
  Unification of DOS version function key assignments.
Interoperability enhancements:
* Improved detection of shifted function keys on various kinds and 
  modes of terminals.
* Added keyboard configuration examples for Control-function key 
  detection for rxvt and mlterm to the runtime support library.
* Added script to support Unicode X font installation to the runtime 
  support library.
* Modified xterm start script "uterm" so that with newer xterm 
  versions (from 201) usage of the xterm built-in most recent version 
  of Unicode width data is enabled (which is often more current than 
  the system-provided locale version).
* Provided makefile for Interix.
Feature enhancements:
* Smart arrows added to optional smart input text replacements.
* New word case toggle function Shift-F3 cycling word casing between 
  all small, beginning capital, and all capitals.
* The "search corresponding bracket" commands ESC ( or ESC ) now also 
  match /* */ pairs and #if #else/#elsif #endif structures.
* New TAB expansion option (-+4 or -+8) that expands TAB key input to 
  an appropriate number of Space characters.
Further enhancements:
* Using paps (a Pango printing script) for printing if available.
* Added PC DOS encoding ("codepage 437") to available encodings.
------------------------------------------------------------------------
Mined Overview
Good interactive features
* Intuitive user interface
* Logical and consistent concept of navigating and editing text 
  (without ancient line-end handling limitations or insert/append confusion)
* Supports various control styles:
  Editing with command control, function key control, or menu control
  Navigation by cursor keys, control keys, mouse or scrollbar
* Comprehensive menus (driven by keyboard or mouse)
* "HOP" key paradigm doubles the number of navigation functions 
  that can be most easily reached and remembered by 
  intuitively amplifying the associated function
* Immediate adjustment if the window size is changed, in any 
  state of interaction
Versatile character encoding support
* Extensive Unicode support, including double-width and combining characters,
  script highlighting, 
  various methods of character input support 
  (mapped keyboard input methods, mnemonic and numeric input),
  supporting CJK, Vietnamese, Hebrew, Arabic, and other scripts
* Support of bidirectional terminals, Arabic ligature joining
* East Asian character set support: handling of major CJK encodings 
  (including GB18030 and full EUC-JP with combining characters) 
  in either Unicode terminal or CJK terminal
* Support for a variety of 8 bit encodings (mapped to Unicode) 
  (with combining characters for Vietnamese and Thai)
* Support of CJK input methods by enhanced keyboard 
  mapping including multiple choice mappings (handled by a pick list menu);
  characters in the pick list being sorted by relevance of Unicode ranges
* Han character information with description and pronunciation
* Auto-detection of text character encoding, edits files with 
  mixed character encoding sections (e.g. mailboxes),
  transparent handling of UTF-16 encoded files
* Auto-detection of UTF-8 / CJK terminal mode and detailed features
  (like different Unicode width and combining data versions)
* Encoding support tested with:
  xterm, mlterm, hanterm, cxterm, rxvt, 
  kde konsole, linux console
Many useful text editing capabilities
* Many text editing features, e.g. paragraph wrapping, 
  auto-indentation and back-tab, smart quotes (with 
  quotation marks style selection and auto-detection) 
  and smart dashes
* Search and replacement patterns can have multiple lines
* Cross-session paste buffer (copy/paste between multiple 
  - even subsequent or remote - invocations of mined)
* Marker stack for quick return to previous text positions
* Multiple paste buffers (emacs-style)
* Program editing features, HTML support and syntax highlighting, 
  identifier and function definition search, also across files; 
  structure input support
* Text and program layout features; auto-indentation and 
  undent function (back-tab), numbered item justification
* Systematic text and file handling safety, avoiding loss of data
* Visible indications of special text contents 
  (TAB characters, different line-end types, character 
  codes that cannot be displayed in the current mode)
* Full binary transparent editing with visible indications 
  (illegal UTF-8 or CJK, mixed line end types, NUL characters, ...)
* Print function that works in all text encodings
* Optional emacs command mode
Small-footprint operation and portability
* Plain text mode (terminal) operation, supporting wide range of terminals
* Instant start-up
* Runs on many platforms: Unix (Linux/Sun/HP/BSD/Mac and more),
  DOS (djgpp), Windows (cygwin, Interix)
* Makefiles also support legacy systems
------------------------------------------------------------------------
Thomas Wolff
mi...@to...
 | 
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-04-15 23:21:32
      
     
   | 
vangsy J. wrote: > i have corrected the keymaps.cfg as advised and I obtain now amharic (or > ethiopic) however when I right-click on " " I get a sound but nothing > else! The reason that the menu does not open is that your screen is not high enough for the menu to fit. In that case there is only a beep. This isn't documented well, sorry, and I'll also consider a solution for menus in smaller screen height for the next version. Meanwhile, please drag the window large (esp. taller) in order to make space for the keyboard mapping menu to display. Kind regards, Thomas Wolff  | 
| 
     
      
      
      From: vangsy J. <jva...@wa...> - 2005-04-12 14:36:01
      
     
   | 
dear sir, i have corrected the keymaps.cfg as advised and I obtain now amharic (or ethiopic) however when I right-click on " " I get a sound but nothing else! If i left-click I get the different options but not the whole list, and at each click a new line is written on the screen. I have to erase some lines! please advise! Moreover if I start on the desktop using an icon with the command "/usr/share/mined/mined", the program does not start, and the situation is the same with the option "mined.desktop". But with "wmined" the program does start but without " "! >From the console though the program start with "mined"! There may be some bugs. yours sincerely, j.v  | 
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-04-11 14:51:00
      
     
   | 
> How to set a keymap properly? for example for greek or for amharic? For Greek, there is a keyboard mapping preconfigured in the set of default input methods. Open the input method menu (Alt-K, ESC K, or right-click on the keyboard mapping flag, "--" by default for none). Then select the Greek mapping. To add an external keyboard mapping to the menu, use the script "mkkbmap" as described in the manual. (For yudit keyboard mapping files, there is actually a bug in the script; to fix it, edit mkkmyudt.sed, in line 17 remove one of the 3 last backslash characters - sorry for that, to be fixed in mined 2000.11) Kind regards, Thomas Wolff  | 
| 
     
      
      
      From: Thomas W. <mi...@to...> - 2005-04-11 13:47:58
      
     
   | 
> "emacs" bindings don't quite match the emacs behaviour for the sequence ^X^C; > emacs: (prompt for save) & exit > mined: automatic save & exit I have fixed this for the next release, thank you. (Sorry for the late response, I was not aware that I had to subscribe to my own mailing list in order to receive the messages...) Kind regards, Thomas Wolff  | 
| 
     
      
      
      From: vangsy J. <jva...@wa...> - 2005-04-10 13:14:03
      
     
   | 
I can't seem to be using the program properly. How to set a keymap properly? for example for greek or for amharic? please advise j.v  | 
| 
     
      
      
      From: Alex P. <A.P...@sm...> - 2005-03-08 01:51:27
      
     
   | 
Don't know if it's been pointed out and fixed yet, but the 'emacs'
bindings don't quite match the emacs behaviour for the sequence ^X^C;
    emacs: (prompt for save) & exit
    mined: automatic save & exit
Obviously this is rather dangerous to somebody familiar with emacs!
I don't know what version of mined I'm using (is there really no --version 
command line option?!), but it's one released with debian sarge.
-- 
<<<Alexspudros Potatopoulos>>>
Defender of Spudkind
 |