From: Bruno H. <br...@cl...> - 2018-05-21 19:43:48
|
Hi Sam, > CLISP supports the #\U<hex> read syntax for Unicode characters but does > not advertise it (instead, the official syntax is #\Code<decimal>). > > Also, the <hex> _must_ be 5 or 9 characters long (padded with 0s if > necessary), and the syntax is implemented as if it were a character name > lookup. > > I wonder if you think it might be a good idea to > 1. Relax the 5/9 length requirement > 2. Advertise the syntax in > https://clisp.sourceforge.io/impnotes/sharpsign.html#sharpsign-backslash Considering that * The widespread practice (starting in unicode.org) is to write - characters with code points < #x10000 with 4 digits - characters with code points >= #x10000 with the minimum possible digits (no leading zeroes), * For interoperability of data files with Sexprs between CL implementations it is necessary that the preferred name printed by one implementation is understood by the other implementations, * sbcl allows leading zeroes on input I find that it would be useful if: 1) When printing a Unicode character that has no explicit name (e.g. #\U061D) clisp prints 4 or more digits (with no leading zero digits for codes >= #x10000). [This is unlike sbcl, which prints #\U061D as #\U61D.) There's no basis for the current behaviour of clisp: (code-char 84321) => #\U00014961 because 32-bit integers are not a primordial type in Lisp. 2) When parsing a Unicode character, leading zeroes don't matter. (This achieves interoperability with SBCL, except for very few specific characters such as #\Bell.) 3) We document this. (This is necessary because this syntax can occur as output of PRINT and WRITE.) Bruno |