Menu

#1569 Unicode 16.0

Initial
accepted
5
2025-11-04
2025-10-19
Zufu Liu
No

Unicode data can be updated to Unicode 16.0 (Python 3.14) or 17.0 (Python 3.15 alpha 1).

Download Windows embeddable package (64-bit) from https://www.python.org/downloads/windows/, add scintilla\scripts into python314._pth (or python315._pth), then run generating scripts using the new python.exe.

size forsymmetricCaseConversionRanges (in CaseConvert.cxx) can be reduced by half after merge range length/pitch (always less than 255) with lower/upper (max Unicode only requires 3 bytes):
(lower << 8, range length), (upper << 8, range pitch), e.g. 0x0061'1A,0x0041'01,.

1 Attachments

Discussion

  • Zufu Liu

    Zufu Liu - 2025-10-19
    • labels: unicode --> unicode, Scintilla, lexilla
     
  • Neil Hodgson

    Neil Hodgson - 2025-11-04
    • status: open --> accepted
     
  • Neil Hodgson

    Neil Hodgson - 2025-11-04

    Committed data changes with [5ce570] and also in Lexilla with https://github.com/ScintillaOrg/lexilla/commit/2362ea7cb2066608b59c1a16e46a0864639b4307

    The main part of the script updates is replacing 4-int tuple symmetric cases with 2-int tuples by joining bits together. To me, this is making the code more difficult to understand and change with only a small payback in size.

    There are some other minor changes that may be beneficial.

    The first is hoisting and simplifying the surrogate test. However, it only succeeds for low surrogates and lets high surrogates through. This actually doesn't matter as Python is OK with lone surrogates and no surrogates have upper, lower, or fold counterparts. The surrogate test isn't needed - I expect I thought it was incorrect to try lone surrogates.

    There are some uses of more explicit formatting and more descriptive variable names that are positive.

     

    Related

    Commit: [5ce570]

  • Zufu Liu

    Zufu Liu - 2025-11-04

    it only succeeds for low surrogates and lets high surrogates through.

    copy & paste error, correct would be 0xD800 <= ch <= 0xDFFF (merge the two tests or from UniConversion.h).

    The surrogate test isn't needed

    Indeed, result is same.

    There are some uses of more explicit formatting

    These (string concatenation or percent formatting) could be found with pylint, e.g.

    C0209: Formatting a regular string which could be an f-string (consider-using-f-string)
    
     

Log in to post a comment.