There are several problems with the handling of WM_CHAR and WM_UNICHAR messages on Windows:
These bugs are fixed in the following merge request: https://sourceforge.net/p/scintilla/code/merge-requests/13/
Scintilla: 56433015b9363c101fceafcd
Scintilla: 564d7e313e5e837d1c03b7d8
Which application are you using that sends WM_UNICHAR?
I am not aware of such an application; I tested it directly from within Notepad++ with the following code:
This should output character U+1F64C (🙌) but right now it outputs U+F64C () which is not a real character. With the above patch, it works fine.
My main reason for this bug report is because I wrote WinCompose (https://github.com/samhocevar/wincompose) which is affected by the WM_CHAR issue (it uses SendInput() which Windows translates to WM_CHAR messages).
By the way, some changes in WM_UNICHAR handling date back to http://sourceforge.net/p/scintilla/bugs/604/ which was a questionable bug report IMHO (it should be possible to input Unicode characters in non-Unicode mode, because some of these could actually be valid in the current codepage (if it’s non-ASCII).
AddCharUTF16 has a comment that implies it is for a multi-character strings but it does not loop over each character. Too few SCN_CHARADDED notifications will be sent if it is called with a multi-character string. If it should handle multi-character strings then that should be implemented.
The line "utfval[len] = '\0';" in AddCharUTF16 is after the last use of utfval so should be removed or moved earlier.
Thanks for the review; I fixed the comment rather than the code because there is no scenario yet where AddCharUTF16 would be called on multi-character strings. I did not understand how to combine merge requests so I created a new one (https://sourceforge.net/p/scintilla/code/merge-requests/14/), I apology for the inconvenience.
Committed as [ce4680] with some minor changes to formatting and documentation.
The code copied from HandleCompositionWindowed has problems (HandleCompositionInline is implemented better) which are part of current discussions so may be replaced. This code reports each byte in a DBCS character with a separate SCN_CHARADDED notification which may lead to the application reading sliced characters.
Related
Commit: [ce4680]