Menu

#2151 On Win32, drag and drop between different encodings fails

Bug
closed-fixed
5
2020-03-03
2020-01-15
No

Drag and drop between different encodings does not work in many cases.

For example, open two instances of SciTE and in one open a file in Japanese that uses Shift-JIS (code.page=932;character.set=128) and in the other open a file in Korean (code.page=949;character.set=129). Drag Korean text into the Japanese document or vice-versa and the text will probably be garbled. This is because the text is being copied as CF_TEXT in the original DBCS encoding. Even characters that are available in both languages may not transfer.

Drag and drop from one of these into a UTF-8 document will also likely fail with some hex blobs like {xBB][xF5]. However, the opposite direction, dragging from UTF-8 to DBCS is more likely to work or work for many characters as this is using CF_UNICODETEXT.

While it would be possible to retain CF_TEXT and convert to CF_UNICODETEXT only when necessary, this would add complexity. Its simpler to only support CF_UNICODETEXT and always convert to / from Unicode for DBCS documents.

The attached patch implements conversion to CF_UNICODETEXT. Its simpler and also unifies drag and copy code.

1 Attachments

Related

Bugs: #2034

Discussion

  • Neil Hodgson

    Neil Hodgson - 2020-01-18
    • status: open --> open-fixed
     
  • Neil Hodgson

    Neil Hodgson - 2020-01-18

    Committed as [c38e1a] .

     

    Related

    Commit: [c38e1a]

  • Zufu Liu

    Zufu Liu - 2020-02-08

    there seems has a breaking change: dropping with CF_TEXT is removed (from DragEnter and Drop), it's possible to convert the dropped data (assuming it's text encoded with CP_ACP) to Unicode then to document code page.

    dropping text to application which requests only CF_TEXT may also broken.

     

    Last edit: Zufu Liu 2020-02-08
    • Neil Hodgson

      Neil Hodgson - 2020-02-08

      This is deliberate and was mentioned above. Is there any reason to continue supporting CF_TEXT?

       
  • Zufu Liu

    Zufu Liu - 2020-02-09

    This incompatible change may cause problems for downstream projects built with Scintilla before this change, and others that built with this change.

    I think CF_TEXT can be retained (and enhanced) with less codes than expected:

    1. Assuming CF_TEXT is encoded with CP_ACP.
    2. on dropping, convert to UTF-8 when document code page is UTF-8 (new change, few lines), otherwise kept as is (old behavior).
    3. on dragging, convert to CP_ACP when document code page is UTF-8 (new change, few lines), otherwise kept as is (old behavior).

    the addition for convert between CP_ACP and CP_UTF8 (decode then encode) is because on Windows, ANSI (CP_ACP) more common than other non-Unicode encoding,
    there are many update to date tools/IDEs still use ANSI as default text encoding.

     
    • Neil Hodgson

      Neil Hodgson - 2020-02-10

      This incompatible change may cause problems for downstream projects built with Scintilla before this change, and others that built with this change.

      I don't think this use-case is important enough to justify the additional code and maintenance burden.

       
  • Zufu Liu

    Zufu Liu - 2020-02-09

    The changes for CF_TEXT:

    1. change fields in FormatEnumerator to ULONG
    2. drag & drop with CF_TEXT as described above

    ScintillaWin.cxx need anothor fix: IMEMessage() is not called,
    maybe WM_INPUTLANGCHANGE and WM_INPUTLANGCHANGEREQUEST can be added into IMEMessage().

     

    Last edit: Zufu Liu 2020-02-09
    • Neil Hodgson

      Neil Hodgson - 2020-02-10

      IMEMessage appears off-topic for this issue.

       
  • Zufu Liu

    Zufu Liu - 2020-02-10

    Moved IMEMessage to [bugs:#2156].

    removed CF_TEXT from the patch, it now only contains changes for FormatEnumerator and ReleaseStgMedium.

     

    Related

    Bugs: #2156

    • Neil Hodgson

      Neil Hodgson - 2020-02-11

      Committed with minor changes as [5d4c40].

       

      Related

      Commit: [5d4c40]

  • Neil Hodgson

    Neil Hodgson - 2020-03-03
    • status: open-fixed --> closed-fixed
     
  • Neil Hodgson

    Neil Hodgson - 2020-03-03

    Committed as [c38e1a] .

     

    Related

    Commit: [c38e1a]


Log in to post a comment.

MongoDB Logo MongoDB