Menu

#1256 Two bugs in the text editor with regard to multibyte characters

None
fixed
nobody
None
1
2015-03-04
2015-02-28
Anonymous
No

I switched to TeXstudio two years ago and I'm quite happy with it. I used to write only English articles and everything works perfectly. Recently I started to write my thesis in Chinese, and encountered two annoying bugs in the editor when dealing with Chinese characters. I believe the same thing will happen for other multibyte characters such as Japanese and Korean.

Bug 1 (Mac OS X only): "soft wrap at window edge" swallows the last one or two characters. Please refer to the attached figure for example. The contents are repetitive "这是一个测试。" ("This is a test."), and it can be seen that the auto wrap swallows the last character (and half of the second last character) of each line.

Bug 2 (both Windows and OS X): input methods are used to input Chinese characters. For example, we key in "ceshi" and the pinyin input method convert it to "测试" ("test"). The problem is that when some text are selected in the editor and I want to key in some Chinese text to replace them, both the raw input "ceshi" and the converted "测试" are preserved. The attached image shows the output after I selected the last "测试" and input "shiyan" to replace it by "试验" ("experiment"). The output is expected to be "这是一个测试" -> "这是一个试验" but the editor yields "这是一个shi'yan试验"。 This bug is independent of the platform and the input method in use.

1 Attachments

Discussion

  • Fang Zhang

    Fang Zhang - 2015-02-28

    Registered an account to keep track of the issue.

    Sorry for not mentioning the environment: Windows 7 SP1 x64, Mac OS X Yosemite 10.10.2, TeXstudio 2.8.8.

    Bug 2 persists in the last preview build TeXstudio 2.8.9 (hg 4901:a3bf8ff6a55d).

     
  • Tim Hoffmann

    Tim Hoffmann - 2015-03-01

    bug 1: fixed. hg 4949 (57c7686b3dd8)

    Internal note: we blacklist certain character categories from which we know they may have non-standard text width. All others are assumed to have the standard fixed pitch width. Maybe we should whitelist standard characters instead. See the TODO comment in the above commit.

     
  • Tim Hoffmann

    Tim Hoffmann - 2015-03-01

    Even though, your bug report is very clear (thanks for that!), I cannot reproduce bug 2:

    This is what I did on Win 7 x64 (Microsoft Pinyin)

    1. Select the last "测试"

    2. Input "shiyan"

    3. Press "1"

    See attachment for the results of each step.

    Please try to describe what you do in even more detail.

     
  • Fang Zhang

    Fang Zhang - 2015-03-02

    Many thanks for your time and effort on this issue! I know it can be quite annoying to look into a problem that one may never encounter in his/her daily use.

    I performed more tests and found an interesting fact: the Bug 2 is triggered only when some text is already selected from left to right (either by <shift>+<direction> or by mouse drag) but not from right to left! The direction of the selection is the unexpected key point.

    My tests are as follows. The original text is "这是一个试图重现问题的测试。" ("This is a test trying to reproduce the issue."), and I want to select the "问题" in the text and replace it by "错误" ("error") by directly keying in its pinyin "cuowu" without manually deleting the selected text by pressing <backspace> or <delete>.

    Steps to reproduce the bug:
    (1) Put the cursor between "现" and "问";
    (2) Press <shift>+<right><right> to select "问题";
    (3) Input "cuowu" with the IME on.
    (4) Press <space> or <1> to select the first candidate "错误".
    Dragging the mouse cursor from left to right to select "问题" also reproduces the bug!

    Steps to avoid the bug:
    (1) Put the cursor between "题" and "的";
    (2) Press <shift>+<left><left> to select "问题";
    (3) Input "cuowu" with the IME on.
    (4) Press <space> or <1> to select the first candidate "错误".
    Dragging the mouse cursor from right to left to select "问题" also avoids the bug.

    Please see the attachment for the output. For each test, the first line is the final result and the second line is the intermediate state before Step 4.

    There is a coproduct of this bug: if the text is selected from left to right and replaced directly with the IME on, the wrong text ("的测") is highlighted and the highlight persists after the selection from candidates is done (after Step 4). This behavior can be observed clearly from the attachment - after triggering the bug many times the article becomes a patchwork.

     

    Last edit: Fang Zhang 2015-03-02
  • Tim Hoffmann

    Tim Hoffmann - 2015-03-02

    Thanks for the exact description.

    Even with this the latin text does not persist here. But I'm using Qt 5.4.1 now and there have been some changes concerning InputMethod in Qt5.4. Since your description is very clear, I assume that it's really fixed inside Qt.

    The bug with the incorrect highlighting was in TXS and is now fixed (hg 4953 (6dcb9ead62cd)).

    Independent of that (and the direction of text selection) the highlight remains on the selected text afer after selection. I.e. after pressing <1> or <space> "错误" is still highlighted. While I'm not an InputMethod expert, I assume this is correct, because we receive it still as a preedit string in the InputMethodEvent. When hitting <space> another time, we receive a InputMethodEvent with the same string as commit string. Then we remove the highlighting.

     
  • Fang Zhang

    Fang Zhang - 2015-03-03

    I tried the latest nightly build with Qt 5.4.1 (hg 4949:57c7686b3dd8) but Bug 2 still persisted. I guess it will be still there after the 4953 fix. :(

    After carrying out more tests with 5 different IMEs (including Chinese and Japanese IMEs) I got more knowledge on the bug: it's not the raw input from the keyboard which is wrongly recorded into the editor, but the "uncommited string" from the IMEs.

    Modern IMEs allow user to input "by sentence". The user keys in some latin characters and the IME "translates" them into the native language. Because of the one-to-many mapping nature of the language, some translations may not be correct, so the IME does not commit the entire sentence to the editor in this stage to allow modifications. The uncommited sentence is underlined (or in TXS's case, highlighted), and the user can navigate through it and make corrections to the translations. When the sentence looks good, the user presses <space> or <enter> to commit the sentence to the editor, and the underline (or highlight) is removed. So your assumption is totally correct! Please refer to the attached example of Google Japanese IME; the entire sentence is underlined and not commited to the editor so I can choose other candidates in it. However, no matter what IME I am using, after the commit is done, the preedit string is supposed to be replaced, but is actually appended, by the commited string the in my TXS editor.

    I think I should restate Bug 2 as follows:

    If some text is selected in the editor from left to right (either by <shift>+<direction> or by mouse drag) and I input some new text with an IME to replace it, the uncommited (preedit) string from the IME is unexpectedly preserved and the commited string is appended afterwards.

    Also, this issue resides only in the code editor (including the macro editor); all textboxes in TXS dialogs works without problem.

    Steps to reproduce the bug:
    (1) Paste "这是一个试图重现错误的测试。" in the TXS editor;
    (1) Put the cursor between "现" and "错" ("这是一个试图重现|错误的测试。");
    (2) Press <shift>+<right><right> (or drag the mouse) to select "错误";
    (3) Input "wenti" (or "mondai" for Japanese IMEs) with the IME.
    (4) Press <space> or the correct number in the candidate window to select the intended "问题".
    (5) For many IMEs (the only exception I found is the Sogou Pinyin IME), the step (4) simply determines a word in the entire uncommited sentence (although the sentence consists of only one word "问题" in this case), and the user has to further press <space> or <enter> to commit the sentence.

    Expected result:
    这是一个试图重现问题的测试。

    Actual results:
    这是一个试图重现wenti问题的测试。(Sogou IME, preedit string = "wenti")
    这是一个试图重现问题问题的测试。(Microsoft Pinyin IME, preedit string = "问题")
    这是一个试图重现问题问题的测试。(Simplified Chinese IME, preedit string = "问题")
    这是一个试图重现問題問題的测试。(Microsoft Japanese IME, preedit string = "問題")
    这是一个试图重现問題問題的测试。(Google Japanese IME, preedit string = "問題")

     

    Last edit: Fang Zhang 2015-03-03
  • Fang Zhang

    Fang Zhang - 2015-03-04

    I confirm that Bug 2 no longer exists in the latest nightly build (hg 4955+:4603fd77edff+). Now I'm a happy TXS user again! Many thanks for the time and efforts on the bug fixes!

     
  • Jan  Sundermeyer

    Jan Sundermeyer - 2015-03-04

    so, this bug report can be closed ?

     
  • Tim Hoffmann

    Tim Hoffmann - 2015-03-04
    • status: open --> fixed
    • Group: -->
     

Log in to post a comment.