I've been building up a library of Unicode-handling functions over the
last three weeks as part of my Google Summer of Code project. My work is
on my unicode-algorithms branch, which is at
https://github.com/krzysz00/sbcl/tree/unicode-algorithms . All of the
new functions are in an SB-UNICODE package, and most of them have
extensive tests and some documentation.
2) What, if anything, should we do about confusables? For example, :peak
and :реак are not visually distinct, though the second keyword is made
wholly of Cyrillic letters while the first is made of Latin ones. One
possibility is that symbols containing codepoints that are confusable
with Latin be printed with vertical bars to make it clear that something
might be up.
this does not sound like a good approach:
- vertical bars are intended to carry information about syntax only, in that they affect the constituent nature of intervening characters.
- the practice would also not achieve its intention for situations where there is no escaping.
? (format *trace-output* "...~a…~s." '|split symbol| '|split too|)
...split symbol...|split too|.
best regards, from berlin,