From: <lut...@fr...> - 2004-10-04 19:30:39
|
Dear SBCL developers, firstly, if a little belated, I would like to say that I appreciate very much that SBCL is getting Unicode support. You were asking to test it, so I experimented a bit with case conversion and found: 1. The treatment of U+00DF (223 decimal, LATIN SMALL LETTER SHARP S) is inconsistent with the Hyperspec. The version I used for the test is "0.8.13.77.character.37", running on x86. (lower-case-p (code-char 223)) -> T (char-upcase (code-char 223)) -> #\Nul (char-downcase (char-upcase (code-char 223))) -> #\Nul The Hyperspec requires (in 13.1.4.3) that a character with case be in a one-to-one correspondence with a different character of the opposite case. I believe the best solution is the one taken by CLISP, namely to make only those characters "characters with case" that have such a one-to-one correspondence in the Unicode tables. Then char-upcase would leave U+00DF unchanged. To cite from impnotes.html of CLISP 2.33: 13.6. Case of Implementation-Defined Characters [CLHS-13.1.4.3.4] The characters with case are those UNICODE characters c, for which the upper case mapping uc and the lower case mapping lc have the following properties: * uc and lc are different * c is one of uc and lc * the upper case mapping of uc and of lc is uc * the lower case mapping of uc and of lc is lc 2. In the course of testing this I ran the example code from the Hyperspec entry for char-upcase and char-downcase (reformatted for shorter lines): (dotimes (code char-code-limit) (let ((char (code-char code))) (when char (unless (cond ((upper-case-p char) (char= (char-upcase (char-downcase char)) char)) ((lower-case-p char) (char= (char-downcase (char-upcase char)) char)) (t (and (char= (char-upcase (char-downcase char)) char) (char= (char-downcase (char-upcase char)) char)))) (return char))))) which to my surprise returned NIL where I expected it to stumble across the above mentioned inconsistency. What is going on here? Greetings, Lutz Euler |