Drag and drop between different encodings does not work in many cases.
For example, open two instances of SciTE and in one open a file in Japanese that uses Shift-JIS (code.page=932;character.set=128) and in the other open a file in Korean (code.page=949;character.set=129). Drag Korean text into the Japanese document or vice-versa and the text will probably be garbled. This is because the text is being copied as CF_TEXT in the original DBCS encoding. Even characters that are available in both languages may not transfer.
Drag and drop from one of these into a UTF-8 document will also likely fail with some hex blobs like {xBB][xF5]. However, the opposite direction, dragging from UTF-8 to DBCS is more likely to work or work for many characters as this is using CF_UNICODETEXT.
While it would be possible to retain CF_TEXT and convert to CF_UNICODETEXT only when necessary, this would add complexity. Its simpler to only support CF_UNICODETEXT and always convert to / from Unicode for DBCS documents.
The attached patch implements conversion to CF_UNICODETEXT. Its simpler and also unifies drag and copy code.
Committed as [c38e1a] .
Related
Commit: [c38e1a]
there seems has a breaking change: dropping with CF_TEXT is removed (from DragEnter and Drop), it's possible to convert the dropped data (assuming it's text encoded with CP_ACP) to Unicode then to document code page.
dropping text to application which requests only CF_TEXT may also broken.
Last edit: Zufu Liu 2020-02-08
This is deliberate and was mentioned above. Is there any reason to continue supporting CF_TEXT?
This incompatible change may cause problems for downstream projects built with Scintilla before this change, and others that built with this change.
I think CF_TEXT can be retained (and enhanced) with less codes than expected:
the addition for convert between CP_ACP and CP_UTF8 (decode then encode) is because on Windows, ANSI (CP_ACP) more common than other non-Unicode encoding,
there are many update to date tools/IDEs still use ANSI as default text encoding.
I don't think this use-case is important enough to justify the additional code and maintenance burden.
The changes for CF_TEXT:
ScintillaWin.cxx need anothor fix: IMEMessage() is not called,
maybe WM_INPUTLANGCHANGE and WM_INPUTLANGCHANGEREQUEST can be added into IMEMessage().
Last edit: Zufu Liu 2020-02-09
IMEMessage appears off-topic for this issue.
Moved IMEMessage to [bugs:#2156].
removed CF_TEXT from the patch, it now only contains changes for FormatEnumerator and ReleaseStgMedium.
Related
Bugs:
#2156Committed with minor changes as [5d4c40].
Related
Commit: [5d4c40]
Committed as [c38e1a] .
Related
Commit: [c38e1a]