Installing a custom PRIMARY selection handler for the
STRING target on Text and Entry widgets don't work as
expected: the characters in the pasted string get
substituted by "?" (question marks) for both 8-bit and
UTF-8 applications.
The system is Linux/X11R6 with KOI8-R locale.
Tcl 8.4.9/Tk 8.4.9
The string that is tried to be returned from selection
handlers is the Russian word "тест"
(0xD4, 0xC5, 0xD3, 0xD4 in KOI8-R).
All the test cases are typed into tkcon (freshly
started for each test).
I. Just setting a handler for PRIMARY/STRING:
% text .t
.t
% selection handle .t sh
% proc sh {args} {return тест}
% selection own .t
% encoding system
koi8-r
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING STRING
Then, in other apps we get pasted
????
(four question marks).
While in the same tkcon we get "тест" pasted (what's
expected).
II. Just setting a handler for UTF8_STRING:
% text .t
.t
% selection handle -type UTF8_STRING .t sh
% proc sh {args} {return тест}
% selection own .t
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING STRING
Then we get the same results as for the first case.
III. Setting handler for STRING, but convert the text
to KOI8-R:
...as before, but the handler is:
% proc sh {args} {return [encoding convertto тест]}
This pastes the correct string to any 8-bit apps,
but some 4-character gibberish gets pasted to
UTF-8-enabled programs:
ÔÅÓÔ
which are: \u00d4\u00c5\u00d3\u00d4
That's just the codes of KOI8-R symbols of the source
word "тест".
IV: Clearing STRING handler first, then installing
UTF8_STRING handler:
% text .t
.t
% selection handle .t {}
% selection handle -type UTF8_STRING .t sh
% proc sh {args} {return тест}
% selection own .t
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING
Then we get that "????" again in either koi8-r and
utf-8 programs.
IV. Clearing UTF8_STRING handler first, then installing
the handler for STRING:
% text .t
.t
% selection handle -type UTF8_STRING .t {}
% selection own .t
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW STRING
% selection handle .t sh
% proc sh {args} {return тест}
% selection get -type TARGETS
MULTIPLE TARGETS TIMESTAMP TK_APPLICATION TK_WINDOW
UTF8_STRING STRING
In this case we get the correct string pasted ("тест")
in any application
(so this works).
Logged In: YES
user_id=1350198
I confirm that this bug reporoduces exactly as described
with the Tcl/Tk 8.5a (CVS, 2006-10-05).
The ICCCM (http://tronche.com/gui/x/icccm/sec-2.html) states clearly that STRING is ISO Latin-1 and so can't contain any cyrillic characters. The alternatives are to use COMPOUND_TEXT (though I think we don't support that on the grounds that it is crazy hard) or UTF8_STRING. With the latter, we need the characters to be unencoded by the script level, and we also need the other side to want it in the first place.
The best I can suggest is to use apps that request UTF8_STRING selections.