#434 Unicode support for Win32 (NT) console channels

closed-out-of-date
8
2010-08-05
2005-08-11
No

This patch, prepared agains the recent TCL8.5 CVS,
makes the Win32 console channel driver use ReadConsoleW
and WriteConsoleW where these functions are available
(NT/2k/XP/2003), without (hopefully) breaking anything
on other systems. TclWinOpenConsoleChannel will set the
channel encoding to unicode when appropriate; thus all
the applications that do gets/puts on the console,
without resetting its options, will not notice any
difference.

Please let me know if such a change requires a TIP to
be included.

Discussion

1 2 > >> (Page 1 of 2)
  • Anton Kovalenko

    Anton Kovalenko - 2005-08-11

    Unicode support for Win32 console

     
  • Don Porter

    Don Porter - 2005-08-11

    Logged In: YES
    user_id=80530

    How does is this patch related to
    Tcl RFE 491789 ?

     
  • Anton Kovalenko

    Anton Kovalenko - 2005-08-11

    Logged In: YES
    user_id=241496

    Tcl RFE 491789 is unrelated to this patch (though I was
    thinking, while submitting this patch, about GetCommandLineW
    as a next step to better unicode support).

    This patch is not about command-line parameters, it's about
    console I/O (i.e stdin, stdout, and stderr of tclsh.exe). As
    TCL already has a separate channel driver for Win32 console,
    the required changes are minimal and they don't affect
    neither TCL API nor signature of Tcl_Main.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2005-08-11

    Logged In: YES
    user_id=72656

    I think this is a candidate for 8.4 and 8.5 (no compat issues).

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2005-08-11
    • assigned_to: andreas_kupries --> davygrvy
    • priority: 5 --> 7
     
  • David Gravereaux

    Logged In: YES
    user_id=7549

    I haven't seen the guts of the patch yet, but this gets my
    vote of approval. I can't do the work of the test/commit
    due to my lack of dev tools on this new computer of mine..
    passing to JH

     
  • David Gravereaux

    • assigned_to: davygrvy --> hobbs
     
  • Anton Kovalenko

    Anton Kovalenko - 2005-08-23

    Logged In: YES
    user_id=241496

    Sorry to all,
    There was a typo in the first variant of this patch
    (ReadConsoleW and WriteConsoleW were mistakenly used on
    non-unicode systems).
    Fixed (new variant attached).

     
  • Pat Thoyts

    Pat Thoyts - 2005-11-03
    • status: open --> closed-accepted
     
  • Pat Thoyts

    Pat Thoyts - 2005-11-03

    Logged In: YES
    user_id=202636

    Works fine for me. Test suite passes and now the console can
    output cyrillic chars and so on.
    Seeing the positive comments from david - applied.

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Backport to 8.4 needed

     
  • Donal K. Fellows

    • assigned_to: hobbs --> patthoyts
    • priority: 7 --> 8
    • status: closed-accepted --> open-accepted
     
  • Anton Kovalenko

    Anton Kovalenko - 2005-11-03

    Logged In: YES
    user_id=241496

    Backport done (tcl-winunicon2-8-4.patch).

     
  • Pat Thoyts

    Pat Thoyts - 2005-11-03
    • status: open-accepted --> closed-accepted
     
  • Pat Thoyts

    Pat Thoyts - 2005-11-03

    Logged In: YES
    user_id=202636

    Oh - well I just committed a backport already :) Thank you
    anyway.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-01-24
    • status: closed-accepted --> open-accepted
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-01-24

    Logged In: YES
    user_id=72656

    Reopening - this caused Expect for Windows to break. We
    need to revisit whether this is a core issue that may effect
    other extensions, or whether Expect for Windows must adapt.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-02-28

    Logged In: YES
    user_id=72656

    This was addressed in Expect, updated to track channel
    encodings.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-02-28
    • status: open-accepted --> closed-fixed
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-03-28

    Logged In: YES
    user_id=72656

    This is being reopened because it was not a complete
    implementation after all. You will see the problem by
    running tclsh in XP and doing 'fconfigure stdin -encoding
    utf-8'. This should output correctly (it used to), but
    hangs with this patch.

    I have reverted for 8.4.13, but it should be reverted or
    corrected for 8.5. I left in the read|writeConsoleProc bits
    in case a corrected solution is presented that handles the
    internal channel encoding changes.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-03-28
    • status: closed-fixed --> open-rejected
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-03-28

    Logged In: YES
    user_id=72656

    See also comments in bug 1442305.

     
  • Anton Kovalenko

    Anton Kovalenko - 2010-03-01

    As this artifact is still open, let me defend this implementation. My purpose is to ensure that it won't be rolled back for 8.6 as well.

    First, [fconfigure stdin -encoding utf8] didn't work before this patch, it just pretended to: utf-8 and a typical console codepage (whatever it be) have ASCII subset in common. Both with this patch and without it, the console channel after [fconfigure ... utf-8] is misconfigured, i.e. it cannot be used to input non-ascii characters correctly. The only difference is the amount of trouble caused by this misconfiguration.

    Mr. Hobbs seems to expect a solution that notices the upper-level -encoding reconfiguration on console channel and somehow takes it into account when the low-level I/O is done. If it's indeed the expected property of a "complete implementation", I would present an objection: no channel type ever worked this way, be it console channels or any other channels. The channel at the lowest level is a stream of bytes, and encoding translation is layered on top of it, but it doesn't affect the channel I/O behavior, ever. This principle was respected by Tcl from the very start of its unicode subsystem; it's also what extension and application developers do expect and will always expect.

    However, there is sometimes a reason for the channel type to influence _initial_ settings of the upper-level translation procedures: for example, TCP sockets require CR-LF line endings for most standard text-based network protocols. The same thing is true for the "unpatched" console channels: they preset the encoding to an autodetected codepage (result of GetConsoleCP); but they don't try to synchronize further channel encoding changes with console codepage. Once the channel is created, the application is free to fconfigure it to any encoding (breaking the correctness of translation, of course); if the console codepage is changed after the channel creation (exec cmd /c chcp...), the "real" low-level encoding and the tcl translation-level encoding are again out of sync.

    "Patched" implementation just detects and uses "unicode" as initial setting, exactly the same way as both implementations act on non-unicode systems, detecting non-unicode codepages. It doesn't prevent an application from altering it, even if it makes the channel unusable.

    To be short, I could say that [fconfigure stdin -encoding unicode] will hang the "unpatched" implementation and work with "patched" one: just the same as the reverse is true for -encoding utf-8 (which fact was the reason of rolling back this patch).

    There is, however, one possible compromise that may be added to my implementation to restore the backward compatibility, making the channel not-so-obviously-broken after encoding misconfiguration: I can use utf-8 as the "presented low-level" encoding on unicode systems, and do the utf8<->unicode translation during the I/O. This way, utf-8 will be set up as a channel encoding by default; if any application reconfigures the channel to utf-8 again, everything will work (not just pretend to); and if the channel is reconfigured to some wrong encoding being a superset of ASCII, the thing at least won't _hang_.

    Hereby I request comments from Mr. Hobbs, both of the current state of the patch and on the proposed improvement, on the possibility of backporting it again into 8.5 and 8.4 once the improvement is made, and on the plans for 8.6 (is it acceptable as is, or is it not rolled back only by accident? will it be acceptable if the change described above is made?).

     
1 2 > >> (Page 1 of 2)