Menu

#2465 Output of QString.toLocal8Bit() fed to QString::fromUtf8()

Bug
closed-fixed
nobody
5
2025-04-02
2025-02-28
8day
No

scintilla/qt/ScintillaEditBase.cpp contains these lines:

case Notification::AutoCSelection:
    emit autoCompleteSelection(scn.lParam, QString::fromUtf8(scn.text));
    break;

Notification::AutoCSelection is created inside ScintillaBase::AutoCompleteCompleted() and ScintillaBase::AutoCompleteSelection(), both of which set NotificationData.text to output of AutoComplete::GetValue(), which in its own stead gets it from ListBox::GetValue(), and that one from ListBoxImpl::GetValue() (implemented in scintilla/qt/PlatQt.cpp):

std::string ListBoxImpl::GetValue(int n)
{
    ListWidget *list = GetWidget();
    QListWidgetItem *item = list->item(n);
    QString str = item->data(Qt::DisplayRole).toString();
    QByteArray bytes = unicodeMode ? str.toUtf8() : str.toLocal8Bit();
    return std::string(bytes.constData());
}

As you can see, the value returned by that method is not always a UTF-8 encoded string, but depends on whether Scintilla is in Unicode mode. It looks as though first block of code should be something like this:

case Notification::AutoCSelection:
    emit autoCompleteSelection(scn.lParam, IsUnicodeMode() ? QString::fromUtf8(scn.text) : QString::fromLocal8Bit(scn.text));
    break;

Note that to/from local 8bit behave differently on different platforms, as well as on Qt 5 and 6 (https://doc.qt.io/qt-6/qstring.html#fromLocal8Bit-2):

On Unix systems this is equivalent to fromUtf8(). Note that on Apple systems this function does not take NSString.defaultCStringEncoding or CFStringGetSystemEncoding() into account, as these functions typically return the legacy "Western (Mac OS Roman)" encoding, which should not be used on modern Apple operating systems. On Windows the system's current code page is used.

Discussion

  • Neil Hodgson

    Neil Hodgson - 2025-02-28

    This is a less trivial change than the previous one. Significant code changes are subject to copyright and authorship should be documented. This only says 'Anon' so is effectively anonymous.

     
    • 8day

      8day - 2025-03-01

      I don't mind if it will be anonymous, but if you really have to use something, I've changed my nickname ("displayed name") to the one I use on GitHub.

       
  • 8day

    8day - 2025-03-01

    BTW, seeing as const char* in all messagesdealing with text are either UTF-8 encoded strings, or strings in document's encoding, unicodeMode ? str.toUtf8() : str.toLocal8Bit() looks extremely unusual... But I guess this is limitation of ListBox::Create() that accepts only unicodeMode_ boolean, without any knowledge about code page and character set.

     
  • Neil Hodgson

    Neil Hodgson - 2025-03-09

    ... ListBox::Create() that accepts only unicodeMode_ boolean, without any knowledge about code page and character set.

    ListBox::SetFont sets the font of the listbox and fonts may carry a character set from FontParameters::characterSet which allows encoding-responsive display of autocompletion lists by some platforms including Win32.

     
  • Neil Hodgson

    Neil Hodgson - 2025-03-11

    that accepts only unicodeMode_ boolean

    If necessary, code page and character set could be communicated to the list box in a compatible way by appending more fields to ListOptions. Since options can be ignored, no changes would be needed to other platform layers.

     
  • 8day

    8day - 2025-03-12

    I've looked into this issue a bit more, and I think you won't like the possible solution, so it's probably best to use that fix with QString::fromLocal8bit() and forget about this issue. I guess I just misunderstood how autocompletion works.

    I think instead it would've been better if encoding of text for autocompletion was decoupled from document's encoding, and instead platform's native encoding was used, esp. seeing as the source of autocompletion text is unlikely to be updated frequently. But seeing that ports for all supported platforms behave in a similar way:
    - GTK uses native encoding at all times, i.e. UTF-8.
    - Windows uses UTF-16 (?) in Unicode mode, or what appears to be local encoding (if I got it right based on use of DrawTextA and DrawTextW as well as LPCSTR and LPCWSTR).
    - Qt uses UTF-8 in Unicode mode, or local encoding otherwise.
    - Cocoa uses NSString UTF-8 (?) in Unicode mode or NSWindowsCP1252StringEncoding a.k.a. Latin-1 otherwise.
    I think a note under ScintillaDoc.html#Autocompletion about this behavior would've been sufficient.

     
  • Neil Hodgson

    Neil Hodgson - 2025-03-12

    All platform layers notify (AutoCSelection, ...) with the byte strings they were called with.

    It's just the Qt platform in ScintillaEditBase that produces an autoCompleteSelection event that translates the byte string to Unicode. QString::fromLocal8bit() would often produce an incorrect value.

    If Unicode isn't required then QByteArray::fromRawData as used by Modified is likely a better choice than QString but if Unicode is required then the code page and character set must be used to produce a useful value. It may break downstream projects if autoCompleteSelection changed to a QByteArray.

     
  • Neil Hodgson

    Neil Hodgson - 2025-03-20

    Committed as [580ad4].

     

    Related

    Commit: [580ad4]

  • Neil Hodgson

    Neil Hodgson - 2025-03-20
    • status: open --> open-fixed
     
  • 8day

    8day - 2025-03-21

    Sorry it took so long, got extremely distracted...

    As for the option you mentioned. Indeed it would've solved the issue completely, but considering that I intend to use Scintilla under Python/PySide in Unicode mode, it's useless in my case, so I don't want to be the only one pushing for it.

     
    • Neil Hodgson

      Neil Hodgson - 2025-03-21

      Projects that only handle Unicode are becoming more common so encoding support is seeing decreased interest. Its OK for autocompletion on the Qt platform to not handle explicit encoding settings for now and this could be implemented if any projects need it.

       
  • Neil Hodgson

    Neil Hodgson - 2025-04-02
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.