Menu

#2351 `SCI_UPPERCASE` turns precomposed adscript iota into extra uppercase iota character

Bug
closed-rejected
nobody
5
2022-09-14
2022-09-13
No

This comes from https://github.com/orbitalquark/textadept/issues/263 .

Using current stable version of Textadept on Windows (I’m afraid I don’t know which GTK+3 or Scintilla versions it contains), changing from these lowercase characters:

ᾳῃῳ
ᾀᾁᾂᾃᾄᾅᾆᾇ
ᾀᾁᾂᾃᾄᾅᾆᾇ
ᾐᾑᾒᾓᾔᾕᾖᾗ
ᾠᾡᾢᾣᾤᾥᾦᾧ

to uppercase characters, the precomposed subscript iota is expanded to an extra uppercase iota character:

ΑΙΗΙΩΙ
ἈΙἉΙἊΙἋΙἌΙἍΙἎΙἏΙ
ἈΙἉΙἊΙἋΙἌΙἍΙἎΙἏΙ
ἨΙἩΙἪΙἫΙἬΙἭΙἮΙἯΙ
ὨΙὩΙὪΙὫΙὬΙὭΙὮΙὯΙ

instead of replacing them with the precomposed adscript iota characters:

ᾼῌῼ
ᾈᾉᾊᾋᾌᾍᾎᾏ
ᾘᾙᾚᾛᾜᾝᾞᾟ
ᾨᾩᾪᾫᾬᾭᾮᾯ

SCI_UPPERCASE seems to be the command causing the wrong conversion.

Would it be possible that SCI_UPPERCASE would not expand single characters into extra characters (that SCI_LOWERCASE would get back as a different string)?

Many thanks for your help.

Discussion

  • Pablo Rodriguez

    Pablo Rodriguez - 2022-09-13

    Sorry, this was a mistake (and there seems to be no editing option of the original message):

    Would it be possible that SCI_UPPERCASE would not expand single characters into extra characters?

    With current behavior,SCI_LOWERCASE gets back as a different string with this characters when SCI_UPPERCASE was used before.

     
  • Neil Hodgson

    Neil Hodgson - 2022-09-13

    Scintilla is using the Unicode uppercase form, not titlecase. Scintilla bases its case table on Python which bases its on the Unicode SpecialCasing.txt file with this line for the first example character 'ᾳ':

    <code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>
    1FB3; 1FB3; 1FBC; 0391 0399; # GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI
    

    That is, the title case for 'ᾳ' is the single character 'ᾼ' but the upper case is the two character sequence 'ΑΙ'.

     
  • Neil Hodgson

    Neil Hodgson - 2022-09-14
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,4 @@
    -This comes from https://github.com/orbitalquark/textadept/issues/263.
    +This comes from https://github.com/orbitalquark/textadept/issues/263 .
    
     Using current stable version of _Textadept_ on _Windows_ (I’m afraid I don’t know which GTK+3 or _Scintilla_ versions it contains), changing from these lowercase characters:
    
     
  • Pablo Rodriguez

    Pablo Rodriguez - 2022-09-14

    Many thanks for your reply, Neil.

    I didn't know that Unicode make a distinction between titlecase and uppercase.

    Given that, the file can be found at https://www.unicode.org/Public/UNIDATA/SpecialCasing.txt (just for reference).

    What surprised me (just as a comment) is the first special case:

    # The German es-zed is special--the normal mapping is to SS.
    # Note: the titlecase should never occur in practice. It is equal to titlecase(uppercase(<es-zed>))
    00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
    

    I think that ß should be uppercased as (explanation and another explanation in German).

    Titlecase makes no sense here, because no German word starts with ss or ß.

    So the report should be closed, but I’m afraid that I cannot close it myself.

    Many thanks for your help and your explanation.

     
  • Neil Hodgson

    Neil Hodgson - 2022-09-14
    • labels: --> scintilla, unicode
    • status: open --> closed-rejected
     

Log in to post a comment.

MongoDB Logo MongoDB