Menu

#69 Server does not check locale/LANG

open
nobody
None
5
2004-11-19
2004-11-19
Mikhail T.
No

The server will use whatever LANG is set in the
environment.

This is bad for both web-pages (which try to get the
date and end up with bogus Unicode for month and
weekday names), and -- more importantly -- for the
Date: headers sent by server to the clients.

Those, AFAIK, must be in ASCII.

Discussion

  • Colin McCormack

    Colin McCormack - 2004-11-20

    Logged In: YES
    user_id=19214

    When you say it doesn't check locale/LANG and that it uses
    whatever LANG is set in the environment ... I'm not sure
    what you mean. I'm fairly sure the server doesn't
    intentionally use any LANG at all.

    What are you seeing that you think is wrong? Are you saying
    that the default character encoding of transmissions is
    wrong, or something? Do you think it needs to be selected
    per page, or negotiated? Where do you think the character
    encoding, and date format, etc etc should come from?

    Oh, and what version of tcl are you running? What's your
    setup?

     
  • Mikhail T.

    Mikhail T. - 2004-11-20

    Logged In: YES
    user_id=173641

    If the server is started with a non-C LANG in effect it will happily use
    that.

    Try it:

    setenv LANG uk_UA.KOI8-U
    tclsh8.4 ....../httpd.tcl
    ....
    HEAD http://localhost:8015/

    and watch for the Date: header as printed out by HEAD. If you read
    Cyrillics, that is :-)

    I know, the server is not doing anything with LANG. I'm saying, it
    should. If LANG is not C, Tcl's time-command will return a localized
    date string -- instead of the ASCII.

    I'm using tclsh8.4 on FreeBSD-5. The default LANG on my system is
    uk_UA.KOI8-U (Ukrainian). Usually, an app needs to be fixed for
    localization. tclhttpd is the opposite :-)

    The workaround is to explicitly set the LANG to C in whatever sh-script
    is used to start the server, which is what the FreeBSD's port of tclhttpd
    is doing now (since a few hours ago).

    http://freshports.org/www/tclhttpd/

    http://www.freebsd.org/cgi/cvsweb.cgi/ports/www/tclhttpd/files/tclhttpd.sh?rev=1.2&content-type=text/x-cvsweb-markup

     
  • Colin McCormack

    Colin McCormack - 2004-11-20

    Logged In: YES
    user_id=19214

    Isn't 'C' locale the default locale on the serving machine?
    Shouldn't tclhttpd use that unless the system is configured
    to tell it not to?

    I don't think tclhttpd should be making those system-wide
    decisions, and I think your work-around is actually the best
    solution to the problem because we don't know what the
    person running the server wants to do about the server's
    locale, tclhttpd doesn't know about locale, and so it makes
    sense to use the locale specified by the system.

     
  • Mikhail T.

    Mikhail T. - 2004-11-20

    Logged In: YES
    user_id=173641

    C locale is the default on _many_ servers, but it is not a requirement
    or anything. On my servers, for example, the Ukrainian locale is the
    default one.

    tclhttpd should not (and can not, of course) make system-wide
    decisions. It just needs to decide for itself -- and use "encoding
    system" to either set the locale to C once and for all, or every time
    before generating the Date:-header (because pages can change locale,
    can't they?).

     
  • Colin McCormack

    Colin McCormack - 2004-11-20

    Logged In: YES
    user_id=19214

    Ok, I know nothing about localisation, but it occurs to me
    that if you can set the environment variables from the shell
    and get the desired effect, you could also set them within
    tclsh by simply setting ::env(LANG) or whatever to whatever
    value you want.

    Want to give it a try, see if it fixes your problem?

    As far as I can see, tclhttpd should be completely locale
    agnostic, and not change anything it inherits of locale
    unless the admin wants to change it.

     
  • Mikhail T.

    Mikhail T. - 2004-11-20

    Logged In: YES
    user_id=173641

    Colin, I'm sorry to say this, but you "are not even wrong".

    Could you, please, show the report to someone else on the dev-team
    as well? Thank you.

     
  • Colin McCormack

    Colin McCormack - 2004-11-20

    Logged In: YES
    user_id=19214

    Anyone on the dev team who's sufficiently motivated can take
    over any time, with my thanks.

    Meanwhile: have you tried doing as I suggested, setting the
    environment variable within the tclhttpd startup? If so,
    what was the outcome?

     
  • Mikhail T.

    Mikhail T. - 2004-11-20

    Logged In: YES
    user_id=173641

    It does not matter whether the encoding is set through `encoding
    system' or through `set ::env(LANG)'. I have a workaround -- setting
    the LANG to C in the rc-script. But this prevents the server from
    generating anything locale-specific.

    The generation of the Date: HTTP-header should be independent of the
    rest of code. Simply calling 'clock format' is not always correct.

     
  • Colin McCormack

    Colin McCormack - 2004-11-20

    Logged In: YES
    user_id=19214

    > The generation of the Date: HTTP-header should be independent
    of the rest of code. Simply calling 'clock format' is not
    always correct.

    Aha! That makes perfect sense to me, at last. Ok, I'll ask
    someone who knows the clock format intimately what should be
    done, and do it.

    Thanks. Any other protocol-level gotchas?

     
  • Colin McCormack

    Colin McCormack - 2004-11-23

    Logged In: YES
    user_id=19214

    The best information I have on this matter is that tcl8.4
    under unix always relies upon the environment for its clock
    format. For this reason (among others) tcl8.5 clock was
    extensively rewritten and includes a -locale switch.

    I am willing to put [set env(LC_TIME) C] in the server setup
    for versions of tcl < 8.5, but I am not willing to
    save/restore env() elements around each invocation of
    [clock], because: (1) setting env() is very expensive, (2)
    it doesn't play well with threads.

    This will mean that (for tclhttpd under tcl8.4) times under
    clock will never be localised, but other aspects should
    remain as they are.

    It's the best I can do, will it suffice?

     
  • Mikhail T.

    Mikhail T. - 2004-11-24

    Logged In: YES
    user_id=173641

    I don't really care one way or the other -- the FreeBSD port sets the
    LANG explicitly prior to launching httpd.tcl

    If you are not willing to alter the environment each time the Date:-header
    is generated (and your reasons are, probably, sound), than it hardly
    matters.

    However, you could add something like nclock or httpdate command to the
    few functions currently written in C (setuid, limit, et al.). That will work
    faster than the clock(n) and will ignore the locale setting.

    I'm attaching a simple implementation, that is 3 times faster than colling
    `clock format [clock seconds]' (4 vs. 12 microseconds) and produces the
    correct result regardless of locale.

     
  • Mikhail T.

    Mikhail T. - 2004-11-24

    Locale independent implementation of HTTP Date header.

     
  • Mikhail T.

    Mikhail T. - 2004-11-24

    Logged In: YES
    user_id=173641

    Come to think of it, the httpdate command should, probably, prepend the
    string "Date: " to the result as well, so that the server can just puts it out
    immediately saving another couple of microseconds:

    -puts $sock "Date: [HttpdDate [clock clicks]]"
    +puts $sock [httpdate]

     

Log in to post a comment.