This comes from https://github.com/orbitalquark/textadept/issues/263 .
Using current stable version of Textadept on Windows (I’m afraid I don’t know which GTK+3 or Scintilla versions it contains), changing from these lowercase characters:
ᾳῃῳ
ᾀᾁᾂᾃᾄᾅᾆᾇ
ᾀᾁᾂᾃᾄᾅᾆᾇ
ᾐᾑᾒᾓᾔᾕᾖᾗ
ᾠᾡᾢᾣᾤᾥᾦᾧ
to uppercase characters, the precomposed subscript iota is expanded to an extra uppercase iota character:
ΑΙΗΙΩΙ
ἈΙἉΙἊΙἋΙἌΙἍΙἎΙἏΙ
ἈΙἉΙἊΙἋΙἌΙἍΙἎΙἏΙ
ἨΙἩΙἪΙἫΙἬΙἭΙἮΙἯΙ
ὨΙὩΙὪΙὫΙὬΙὭΙὮΙὯΙ
instead of replacing them with the precomposed adscript iota characters:
ᾼῌῼ
ᾈᾉᾊᾋᾌᾍᾎᾏ
ᾘᾙᾚᾛᾜᾝᾞᾟ
ᾨᾩᾪᾫᾬᾭᾮᾯ
SCI_UPPERCASE seems to be the command causing the wrong conversion.
Would it be possible that SCI_UPPERCASE would not expand single characters into extra characters (that SCI_LOWERCASE would get back as a different string)?
Many thanks for your help.
Sorry, this was a mistake (and there seems to be no editing option of the original message):
Scintilla is using the Unicode uppercase form, not titlecase. Scintilla bases its case table on Python which bases its on the Unicode SpecialCasing.txt file with this line for the first example character 'ᾳ':
That is, the title case for 'ᾳ' is the single character 'ᾼ' but the upper case is the two character sequence 'ΑΙ'.
Diff:
Many thanks for your reply, Neil.
I didn't know that Unicode make a distinction between titlecase and uppercase.
Given that, the file can be found at https://www.unicode.org/Public/UNIDATA/SpecialCasing.txt (just for reference).
What surprised me (just as a comment) is the first special case:
I think that ß should be uppercased as ẞ (explanation and another explanation in German).
Titlecase makes no sense here, because no German word starts with ss or ß.
So the report should be closed, but I’m afraid that I cannot close it myself.
Many thanks for your help and your explanation.