From: Adam W. <li...@co...> - 2004-12-07 21:57:42
|
Hi Christophe Rhodes, > Adam Warner <li...@co...> writes: > >> Thanks for the clarifications. It's now clear to me that SBCL must be >> built with Unicode support to continue to be as useful: >> <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=284602> > > "useful" seems to be in the eye of the beholder, here. I prefer to be > able to type an e-acute at my lisp prompt and have it be recognized as > an e-acute rather than as some binary junk. In addition, in the bug > report you have ignored, deliberately or not, what I wrote in my > previous mail about running under a given locale; if you run SBCL in > an ISO-8859-1 or ASCII locale, then the octets are directly > interpreted as characters. Try > $ LANG=C sbcl > * "whatever binary stuff you want" Thank you for your persistence. In your original reply I thought you were encouraging me to run my terminal in ISO-8859-1 which would have solved the arbitrary octet problem at the expense of not being able to print characters with code points above 255. Now I appreciate that I can simply lie to SBCL about the current character encoding so I can read and build UTF-8 octet sequences using CHAR-CODE and CODE-CHAR respectively. This has tremendous implications for being able to mix strings, code and binary data over a character stream from one Lisp implementation to another. Binary data can be sent over the character stream with virtually no overhead (such as first converting the binary data to an ASCII subset and then decoding that ASCII subset at the other end). Thanks again. >> I'd drop the support unless you later intend to revert to naive CMUCL-like >> string handling. In the naive situation one could use CHAR-CODE to >> manually decode the Unicode code points. Without the former naive string >> handling one can't even read the string in the first place, so there's >> very little point in keeping it as a build option (i.e. to be useful one >> would also need to disable external format support). > > This seems a little broad -- are you really asserting that naive > CMUCL-like string handling is the only possible use of a > non-wide-character build? You've demonstrated that even Unicode SBCL can be used for precisely this! The benefit over Unicode SBCL is: strings take significantly less storage space with European character sets. The cost is: No Unicode code point character handling. Unicode SBCL sounds like the build most people will want to use even when reading arbitrary octets over character streams (since they also have the option to turn then into sequences of Unicode code point strings). Regards, Adam |