Menu

#2126 PRIMARY sel handlers and non-Latin-1 encodings

obsolete: 8.4.9
open
5
2006-10-05
2006-10-05
No

Installing a custom PRIMARY selection handler for the
STRING target on Text and Entry widgets don't work as
expected: the characters in the pasted string get
substituted by "?" (question marks) for both 8-bit and
UTF-8 applications.

The system is Linux/X11R6 with KOI8-R locale.
Tcl 8.4.9/Tk 8.4.9

The string that is tried to be returned from selection
handlers is the Russian word "тест"
(0xD4, 0xC5, 0xD3, 0xD4 in KOI8-R).

All the test cases are typed into tkcon (freshly
started for each test).

I. Just setting a handler for PRIMARY/STRING:

% text .t
.t
% selection handle .t sh
% proc sh {args} {return тест}
% selection own .t
% encoding system
koi8-r
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING STRING

Then, in other apps we get pasted
????
(four question marks).

While in the same tkcon we get "тест" pasted (what's
expected).

II. Just setting a handler for UTF8_STRING:

% text .t
.t
% selection handle -type UTF8_STRING .t sh
% proc sh {args} {return тест}
% selection own .t
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING STRING

Then we get the same results as for the first case.

III. Setting handler for STRING, but convert the text
to KOI8-R:

...as before, but the handler is:
% proc sh {args} {return [encoding convertto тест]}

This pastes the correct string to any 8-bit apps,
but some 4-character gibberish gets pasted to
UTF-8-enabled programs:
ÔÅÓÔ
which are: \u00d4\u00c5\u00d3\u00d4
That's just the codes of KOI8-R symbols of the source
word "тест".

IV: Clearing STRING handler first, then installing
UTF8_STRING handler:

% text .t
.t
% selection handle .t {}
% selection handle -type UTF8_STRING .t sh
% proc sh {args} {return тест}
% selection own .t
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING

Then we get that "????" again in either koi8-r and
utf-8 programs.

IV. Clearing UTF8_STRING handler first, then installing
the handler for STRING:

% text .t
.t
% selection handle -type UTF8_STRING .t {}
% selection own .t
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW STRING
% selection handle .t sh
% proc sh {args} {return тест}
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING STRING

In this case we get the correct string pasted ("тест")
in any application
(so this works).

Discussion

  • Donal K. Fellows

    • labels: --> 53. [selection]
    • milestone: --> obsolete: 8.4.9
    • assigned_to: nobody --> hobbs
     
  • Konstantin Khomoutov

    Logged In: YES
    user_id=1350198

    I confirm that this bug reporoduces exactly as described
    with the Tcl/Tk 8.5a (CVS, 2006-10-05).

     
  • Donal K. Fellows

    The ICCCM (http://tronche.com/gui/x/icccm/sec-2.html) states clearly that STRING is ISO Latin-1 and so can't contain any cyrillic characters. The alternatives are to use COMPOUND_TEXT (though I think we don't support that on the grounds that it is crazy hard) or UTF8_STRING. With the latter, we need the characters to be unencoded by the script level, and we also need the other side to want it in the first place.

    The best I can suggest is to use apps that request UTF8_STRING selections.