Re: #\U<hex> syntax

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Sam,

> CLISP supports the #\U<hex> read syntax for Unicode characters but does
> not advertise it (instead, the official syntax is #\Code<decimal>).
> 
> Also, the <hex> _must_ be 5 or 9 characters long (padded with 0s if
> necessary), and the syntax is implemented as if it were a character name
> lookup.
> 
> I wonder if you think it might be a good idea to
> 1. Relax the 5/9 length requirement
> 2. Advertise the syntax in
> https://clisp.sourceforge.io/impnotes/sharpsign.html#sharpsign-backslash

Considering that
  * The widespread practice (starting in unicode.org) is to write
      - characters with code points < #x10000 with 4 digits
      - characters with code points >= #x10000 with the minimum possible
        digits (no leading zeroes),
  * For interoperability of data files with Sexprs between CL implementations
    it is necessary that the preferred name printed by one implementation is
    understood by the other implementations,
  * sbcl allows leading zeroes on input

I find that it would be useful if:

  1) When printing a Unicode character that has no explicit name (e.g. #\U061D)
     clisp prints 4 or more digits (with no leading zero digits for codes
     >= #x10000). [This is unlike sbcl, which prints #\U061D as #\U61D.)
     There's no basis for the current behaviour of clisp:
       (code-char 84321) => #\U00014961
     because 32-bit integers are not a primordial type in Lisp.

  2) When parsing a Unicode character, leading zeroes don't matter. (This
     achieves interoperability with SBCL, except for very few specific characters
     such as #\Bell.)

  3) We document this. (This is necessary because this syntax can occur as
     output of PRINT and WRITE.)

Bruno