scintilla/qt/ScintillaEditBase.cpp contains these lines:
case Notification::AutoCSelection:
emit autoCompleteSelection(scn.lParam, QString::fromUtf8(scn.text));
break;
Notification::AutoCSelection is created inside ScintillaBase::AutoCompleteCompleted() and ScintillaBase::AutoCompleteSelection(), both of which set NotificationData.text to output of AutoComplete::GetValue(), which in its own stead gets it from ListBox::GetValue(), and that one from ListBoxImpl::GetValue() (implemented in scintilla/qt/PlatQt.cpp):
std::string ListBoxImpl::GetValue(int n)
{
ListWidget *list = GetWidget();
QListWidgetItem *item = list->item(n);
QString str = item->data(Qt::DisplayRole).toString();
QByteArray bytes = unicodeMode ? str.toUtf8() : str.toLocal8Bit();
return std::string(bytes.constData());
}
As you can see, the value returned by that method is not always a UTF-8 encoded string, but depends on whether Scintilla is in Unicode mode. It looks as though first block of code should be something like this:
case Notification::AutoCSelection:
emit autoCompleteSelection(scn.lParam, IsUnicodeMode() ? QString::fromUtf8(scn.text) : QString::fromLocal8Bit(scn.text));
break;
Note that to/from local 8bit behave differently on different platforms, as well as on Qt 5 and 6 (https://doc.qt.io/qt-6/qstring.html#fromLocal8Bit-2):
On Unix systems this is equivalent to fromUtf8(). Note that on Apple systems this function does not take NSString.defaultCStringEncoding or CFStringGetSystemEncoding() into account, as these functions typically return the legacy "Western (Mac OS Roman)" encoding, which should not be used on modern Apple operating systems. On Windows the system's current code page is used.
This is a less trivial change than the previous one. Significant code changes are subject to copyright and authorship should be documented. This only says 'Anon' so is effectively anonymous.
I don't mind if it will be anonymous, but if you really have to use something, I've changed my nickname ("displayed name") to the one I use on GitHub.
BTW, seeing as
const char*in all messagesdealing with text are either UTF-8 encoded strings, or strings in document's encoding,unicodeMode ? str.toUtf8() : str.toLocal8Bit()looks extremely unusual... But I guess this is limitation ofListBox::Create()that accepts onlyunicodeMode_boolean, without any knowledge about code page and character set.ListBox::SetFontsets the font of the listbox and fonts may carry a character set fromFontParameters::characterSetwhich allows encoding-responsive display of autocompletion lists by some platforms including Win32.If necessary, code page and character set could be communicated to the list box in a compatible way by appending more fields to
ListOptions. Since options can be ignored, no changes would be needed to other platform layers.I've looked into this issue a bit more, and I think you won't like the possible solution, so it's probably best to use that fix with
QString::fromLocal8bit()and forget about this issue. I guess I just misunderstood how autocompletion works.I think instead it would've been better if encoding of text for autocompletion was decoupled from document's encoding, and instead platform's native encoding was used, esp. seeing as the source of autocompletion text is unlikely to be updated frequently. But seeing that ports for all supported platforms behave in a similar way:
- GTK uses native encoding at all times, i.e. UTF-8.
- Windows uses UTF-16 (?) in Unicode mode, or what appears to be local encoding (if I got it right based on use of
DrawTextAandDrawTextWas well asLPCSTRandLPCWSTR).- Qt uses UTF-8 in Unicode mode, or local encoding otherwise.
- Cocoa uses
NSStringUTF-8 (?) in Unicode mode orNSWindowsCP1252StringEncodinga.k.a. Latin-1 otherwise.I think a note under ScintillaDoc.html#Autocompletion about this behavior would've been sufficient.
All platform layers notify (
AutoCSelection, ...) with the byte strings they were called with.It's just the Qt platform in
ScintillaEditBasethat produces anautoCompleteSelectionevent that translates the byte string to Unicode.QString::fromLocal8bit()would often produce an incorrect value.If Unicode isn't required then
QByteArray::fromRawDataas used byModifiedis likely a better choice thanQStringbut if Unicode is required then the code page and character set must be used to produce a useful value. It may break downstream projects ifautoCompleteSelectionchanged to aQByteArray.Committed as [580ad4].
Related
Commit: [580ad4]
Sorry it took so long, got extremely distracted...
As for the option you mentioned. Indeed it would've solved the issue completely, but considering that I intend to use Scintilla under Python/PySide in Unicode mode, it's useless in my case, so I don't want to be the only one pushing for it.
Projects that only handle Unicode are becoming more common so encoding support is seeing decreased interest. Its OK for autocompletion on the Qt platform to not handle explicit encoding settings for now and this could be implemented if any projects need it.