Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#4077 group names with umlauts are not properly decoded in group_c

1.580
closed-fixed
Jamie Cameron
5
2012-04-12
2012-03-31
Robert Kehl
No

I have Webmin 1.580 set up on a Debian 6 box and set it to use the German language. The group enumeration is done via "compat winbind" in /etc/nsswitch.conf, winbindd asking a german Windows server 2003 R2. A call to "getent group domänen-admins" yields:

# getent group domänen-admins
domänen-admins:x:10008:administrator

So, we do have group names with umlauts.

As in Webmin the default encoding for language 'de' is iso-8859-1 unfortunately, with winbindd giving out utf-8 coded strings, the output of group_chooser.cgi is not really usable. Groups without umlauts are recognized and can be saved, but chosing groups with umlauts results results in unusable entries in for example the Samba module, nor do the members of that groups enumerate in group_chooser.cgi.

Most probably this happens in user_chooser.cgi, too - but we have no usernames with umlauts gladly, so I did not check that.

I attached a fairly ugly patch to fix our very situation. It converts the group names from utf-8 to iso-8859-1 using the "Encode" module, regardless of which language you have set in Webmin. As we do only use it in German, it works for us, but is for sure way to unflexible to work elsewhere, I guess.

With kind regards,

Robert Kehl

Discussion

  • Robert Kehl
    Robert Kehl
    2012-03-31

    Patch to convert group names from utf-8 to iso-8859-1

     
  • Jamie Cameron
    Jamie Cameron
    2012-04-02

    Thanks for the patch .. however, I'm not sure if this is really the right solution. It might be instead better to have winbind convert group names to iso-8859-1 before they are made available to get getgr* functions, as Webmin is just one of the programs that has to deal with group names .. and I strongly suspect that all assume an 8-bit character set.

     
  • Robert Kehl
    Robert Kehl
    2012-04-04

    I'm fairly sure it isn't the right patch, as it's very specific to the german language - it would definitely not work for chinese implementation, f. e.

    Getting winbind to talk iso-8859-1 isn't right either, as it generally uses utf-8 for it's output. That seems reasonable to me. The thing is that Webmin doesn't even try to convert the output to the charset it uses - so utf-8 encoded output will most likely always be incorrect, unless you just use us-ascii for group names, f.e. This is what most people do, I suppose, and it sounds reasonable, too.

    But there are installations out there that use group names (and perhaps user names, too) that are not pure us-ascii. So, shouldn't Webmin try to convert the utf-8 output to the charset chosen?

    Btw, the problem would diminish if Webmin would use utf-8 for output in most languages, if not in all. Using iso-8859-1 is soo 90's ;-)

    I tried switching to using utf-8 with the german translation, but it fails greatly, as the translation is pure iso-8859-1. Bumping it to utf-8 would really help.

    With kind regards,

    Robert Kehl

     
  • Jamie Cameron
    Jamie Cameron
    2012-04-04

    I suppose one option would be for Webmin to have a language option for German with UTF-8 encoding.. but I worry that this would be just obscuring the real issue, which is that non us-ascii characters aren't really supported in user or group names.

    For example, try running the following on your system :

    groupadd domänen-admins

    for me, this fails with :

    groupadd: domänen-admins is not a valid group name

    I can see how maybe characters like ä could be OK in a group name as it is just a single byte, but unicode characters which are two bytes are likely to confuse any code that performs operations like truncation.

     
  • Robert Kehl
    Robert Kehl
    2012-04-05

    On a recent Debian 6 (Squeeze) I can create a group with umlauts, as I expected, because the system is fully utf-8 compliant.

    On the other hand, creating a group on the linux box is irrelevant, because the groups in questioon come from winbind, which in turn drags them from the Active Directory of a Windows Server 2003. There Umlauts are ok, because the group names are stored in utf-8.

    With kind regards,

    Robert Kehl

     
  • Jamie Cameron
    Jamie Cameron
    2012-04-05

    On Debian 6, can you create a group with a utf-8 umlaut in the name?

    Or only an iso-8859-1 umlaut?

     
  • Robert Kehl
    Robert Kehl
    2012-04-06

    Yes, it is possible - as my system is set to utf-8, it isn't an iso-8859-1 encoding. The system doesn't actually know about iso-8859-1 at all:

    # locale
    LANG=de_DE.UTF-8
    LANGUAGE=
    LC_CTYPE="de_DE.UTF-8"
    LC_NUMERIC="de_DE.UTF-8"
    LC_TIME="de_DE.UTF-8"
    LC_COLLATE="de_DE.UTF-8"
    LC_MONETARY="de_DE.UTF-8"
    LC_MESSAGES="de_DE.UTF-8"
    LC_PAPER="de_DE.UTF-8"
    LC_NAME="de_DE.UTF-8"
    LC_ADDRESS="de_DE.UTF-8"
    LC_TELEPHONE="de_DE.UTF-8"
    LC_MEASUREMENT="de_DE.UTF-8"
    LC_IDENTIFICATION="de_DE.UTF-8"
    LC_ALL=

    With kind regards,

    Robert Kehl

     
  • Jamie Cameron
    Jamie Cameron
    2012-04-06

    So one quick fix for your system may be to force Webmin to use to UTF-8 character set in the UI. This can be done by editing /etc/webmin/config and adding the line :

    charset=UTF-8

    Is UTF-8 the official encoding for user and group names though? I was unable to find any authoritative documentation on this..

     
  • Robert Kehl
    Robert Kehl
    2012-04-11

    The quick fix did not work properly, as the german translation is encoded in iso-8859-1, leading to ugly encoding errors in a variety of places. Setting utf-8 as the default encodign would help for most western languages, though, so I'd encourage to switch the language encoding s to utf-8 wherever possible. If help were needed, just point at me ;)

    I found this on Microsofts pages: "Active Directory sends all responses in UTF-8 encoded form."
    Source: http://technet.microsoft.com/en-us/library/cc961766.aspx

    So, decoding from utf-8 to whatever encoding is used (unless it's utf-8, for sure) would be a correct step to take, though it would not ensure that things were always readable. F.e.: encoding umlauts from utf-8 to iso-8859-1 works, as umlauts are clearly contained in iso-8859-1. But it is actually possible to use utf-8 encoded chars in a german Windows server that are not representable in iso-8859-1. This ads to what I stated above.

    With kind regards,

    Robert Kehl

     
  • Jamie Cameron
    Jamie Cameron
    2012-04-12

    • status: open --> closed-fixed
     
  • Jamie Cameron
    Jamie Cameron
    2012-04-12

    So after considering this some more, I decided to just add UTF-8 encodings for all the languages supported by Webmin, including German. You can download the 1.586 development version which includes this fix from http://www.webmin.com/devel.html .