#1659 special characters not correctly sent

obsolete: 8.3.3
closed-fixed
5
2012-05-10
2001-10-23
No

I'm using Tcl 8.3.3 as a DDE server under Windows and
I encounter problems with special characters. If a
text variable is created by tcl, requesting the
variable content doesn't give the correct string for
special characters.
For example, you have a text file that contains a text
with the copyright sign:
{test string: Š}
You load that file in tcl (which converts it to
unicode I assume) and get back its content with a
DDERequest, then you get (and I hope I'm not the only
one): {test string: Š}, and that's the same for
every "exotic" character.
I tried with different DDE client (Excel VB, Matlab
and Aphelion, an image processing software), all give
the same result. I also tried to change character
encoding with no success (with 'encoding system'
and 'encoding convertto'). Characters seem to be
always sent as unicode.
Note that when the DDE client creates a string in tcl
(with something like 'DDEExecute DDE_Id, {set Texte
blabla}'), it gets exactly the same text while
requesting the variable (with 'DDERequest DDE_Id,
Texte') whatever the string is. Things get wrong only
when the string is created by tcl.

Discussion

  • JM. Philippe

    JM. Philippe - 2002-10-09
    • status: open --> closed-fixed
     
  • JM. Philippe

    JM. Philippe - 2002-10-09

    Logged In: YES
    user_id=134307

    Download the tclWinDde.c attached to bug #620541, and
    recompile the dde package.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-06

    Having a look

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-06
    • assigned_to: kennykb --> nijtmans
    • status: closed-fixed --> open
     
  • Donal K. Fellows

    It's bizarre to look at it now; the somewhat uncertain handling of Unicode by SF over the years (i.e., it changed at some point) has mangled the bug report itself...

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-07
    • assigned_to: nijtmans --> dkf
     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-07

    jmphilippe's proposal from #620541 committed
    to bug-473946 branch. Donal, what do you
    think about it?

     
  • Donal K. Fellows

    Seems to be on the right sort of lines (can't test; wrong platform).

    Key questions to my thinking: When communicating with other Tcl processes, should we transfer as Unicode, UTF-8 or [encoding system]? What about when communicating with other processes?

    With those sorted out, the implementation strategy should just follow.

     
  • Donal K. Fellows

    • assigned_to: dkf --> nijtmans
     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-07

    OK, I'll give a shot at a complete implementation, starting
    with jmphilippe's patch. For the actual data, the CF_TEXT
    clipboard format is used. On WIN2000 this should be
    changed to CF_UNICODETEXT. Windows can do an
    automatic conversion between those two.

    Yes, I consider it a bug. the CF_TEXT format assumes
    the system encoding, but Tcl is putting UTF-8 in it.
    When sending a message from Tcl to Tcl, this
    doesn't matter. But when other applications try
    to interpret this UTF-8 as sytem encoding, different
    characters arise.

    jmphilippe's patch only handled one way: from
    Tcl to external applications. But the reverse
    should be handled as well.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-09

    Better fix, derived from jmphilippe's now
    committed to:
    ******* branch "bug-473946" *******
    Tested with word2007, setting
    a Tcl variable "test" to "€" and then trying to
    retreive it from word, using a field like
    {DDEAUTO TclEval test test}. Works!

    The advantage of using CF_UNICODETEXT
    is that it works for all chacacters. Tcl itself
    continues to use CF_TEXT for its
    communication, sending UTF-8 over it,
    so all current tests continue to function
    as normal. CF_UNICODETEXT is only
    used when an external application
    requests it. So, this is 100% upwards
    compatble.

    Please test/evaluate anyone interested. I
    plan to merge this to core-8-4-branch,
    core-8-5-branch and trunk in a few days.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-09

    Better fix, using the Tcl_GetUnicode* functions in
    stead of the Tcl_Win functions, so we no longer have
    to detect whether we are on WinNT+. It should
    now work equally on Win95 too.

    Version updated to "1.2.5"

    I think it's ready to be merged to all other branches

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-09

    > Key questions to my thinking: When communicating with other Tcl processes,
    > should we transfer as Unicode, UTF-8 or [encoding system]? What about when
    > communicating with other processes?
    OK, let's sort that out.

    Currently, Tcl always comminucates in dde using the CF_TEXT clipboard
    format, which is meant for [encoding system]. But Tcl sends UTF-8
    over this. That's OK, when Tcl is at both ends. Most other applications
    (e.g. Word) communicate in CF_UNICODETEXT, which assumes Unicode.
    When both dde ends use a different clipboard format, Windows
    translates it automatically, and that's where the problem arises:
    This translation assumes the [system encoding], but in reality
    it's UTF-8.

    Bug #473946 is solved (in the branch) by letting Tcl do
    the translation. Tcl now understands both CF_TEXT and
    CF_UNICODETEXT, and when it receives a request
    for CF_UNICODETEXT it will send the answer in
    the requested format directly. This fixes the situation
    that Tcl handles de dde server size while any other
    application handles the client side.

    So, what to do when Tcl is the dde client.
    communicating with another dde server?
    Then Tcl should communicate in
    CF_UNICODETEXT then. But how
    can Tcl know whether the other size
    is Tcl or not? I think that starting with
    Tcl 8.6, All dde communications should
    be in CF_UNICODETEXT, unless
    an external client (e.g. Tcl 8.5) request
    differently. Then Tcl doesn't have
    to know what's on the other side, and
    doesn't need to provide an encoding.
    If an external client requests CF_TEXT,
    then Tcl can safely assume that
    it's communication with an older Tcl
    dde server, because no-one else
    still does that ;-).

    This way, TIP #106 is simply
    not necessary any more....

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-10

    Now fixed in core-8-4-branch, core-8-5-branch and trunk.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-10
    • status: open --> closed-fixed
     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-11

    Just one remark. The original patch suggestion was
    from Miguel Baсуn, not from JM. Philippe.

     
  • Jan Nijtmans

    Jan Nijtmans - 2012-05-25

    Another addition: The original committed
    fix was only for XTYP_REQUEST, but the
    same can happen for XTYP_EXECUTE
    as well.

    Improved fix committed to all branches