Scintilla / Bugs / #2351 `SCI_UPPERCASE` turns precomposed adscript iota into extra uppercase iota character

Pablo Rodriguez - 2022-09-13

Sorry, this was a mistake (and there seems to be no editing option of the original message):

Would it be possible that SCI_UPPERCASE would not expand single characters into extra characters?

With current behavior,SCI_LOWERCASE gets back as a different string with this characters when SCI_UPPERCASE was used before.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2022-09-13

Scintilla is using the Unicode uppercase form, not titlecase. Scintilla bases its case table on Python which bases its on the Unicode SpecialCasing.txt file with this line for the first example character 'ᾳ':

<code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment> 1FB3; 1FB3; 1FBC; 0391 0399; # GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI

That is, the title case for 'ᾳ' is the single character 'ᾼ' but the upper case is the two character sequence 'ΑΙ'.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Description has changed:

Diff:

--- old
+++ new
@@ -1,4 +1,4 @@
-This comes from https://github.com/orbitalquark/textadept/issues/263.
+This comes from https://github.com/orbitalquark/textadept/issues/263 .

 Using current stable version of _Textadept_ on _Windows_ (I’m afraid I don’t know which GTK+3 or _Scintilla_ versions it contains), changing from these lowercase characters:

Pablo Rodriguez - 2022-09-14

Many thanks for your reply, Neil.

I didn't know that Unicode make a distinction between titlecase and uppercase.

Given that, the file can be found at https://www.unicode.org/Public/UNIDATA/SpecialCasing.txt (just for reference).

What surprised me (just as a comment) is the first special case:

# The German es-zed is special--the normal mapping is to SS. # Note: the titlecase should never occur in practice. It is equal to titlecase(uppercase(<es-zed>)) 00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S

I think that ß should be uppercased as ẞ (explanation and another explanation in German).

Titlecase makes no sense here, because no German word starts with ss or ß.

So the report should be closed, but I’m afraid that I cannot close it myself.

Many thanks for your help and your explanation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2022-09-14

labels: --> scintilla, unicode

status: open --> closed-rejected
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

`SCI_UPPERCASE` turns precomposed adscript iota into extra uppercase iota character

Group

Searches

Help

#2351 `SCI_UPPERCASE` turns precomposed adscript iota into extra uppercase iota character

Discussion