For UTF-8 and UTF-16 encodings, Notepad++ (tested with version 6.3.3) apparently shows sometimes wrong figures for number of characters and words in the “Summary” dialogue (View – Summary ...).
Furthermore, it is not clear, what is displayed in the status bar: the number of characters or the number of bytes?! Looking for ANSI and UTF-8, apparently the number of bytes is displayed, inclusive CR and LF. However, for UTF-16 it seems that the number of characters (inclusive CR and LF) is displayed. I would expect to see in the status bar the number of selected characters, independently of encoding.
Finally, the “Summary” dialogue displays different numbers than the status bar! Especially for multi-byte characters like French “éèê“ or German “äöü”. For those, the number of characters is reported wrongly.
Up to now, multi-area selections are not handled by the status bar – a N/A is displayed.
In order to proof the above said, I have provided below some prepared text files with instructions inside (see TestFiles_UnicodeCharacterWordCount.7z).
Analysing the reasons for that behaviour, I saw that there are different subroutines at different places (ScintillaEditView.cpp and Notepad_plus.cpp) for the status bar and the Summary dialogue doing nearly the same things, but just nearly and with bugs.
In order to correct those bugs, I have provided a patch. I merged the spread functions and moved them to ScintillaEditView.cpp.
Furthermore, Scintilla Version 3.3.x provides a function to retrieve the number of selected characters, independently of encoding (i.e. apparently Scintilla converts and stores internally all in UTF-8). This function is used but a replacement function is also provided for the currently used Scintilla 2.2.7.
There are two patches provided (based on SVN 1046):
The first patch includes the old code just commented, a lot of explanations and some comments for debugging and proofing of the new functions.
This is the final, cleaned-up patch.
However, you are free to modify the “printouts” into status bar and Summary dialogue.