From: Krzysztof D. <krz...@gm...> - 2014-03-02 19:51:24
|
On 03/01/2014 08:05 AM, Christophe Rhodes wrote: > Elias Mårtenson <lo...@gm...> writes: > >> On 1 March 2014 00:21, Christophe Rhodes <cs...@ca...> wrote: >> >>> Oh, right, I didn't say explicitly: I think the current behaviour of >>> digit-char-p (which is unchanged since the Unicode merge, or at least >>> the great whitespace explosion) is wrong in that it shouldn't actually >>> consider non-ascii to be digits even for radixes larger than 10. >> >> I'm sorry for breaking into this discussion, and not even knowing what the >> whitespace explosion actually refers to. But, this led me to note that the >> Unicode space characters are not actually space characters in SBCL. For >> example, the sequence U+0031 DIGIT ONE, U+2003 EM SPACE, U+0032 DIGIT TWO >> is interpreted as a single symbol name comprised of three characters, as >> opposed to a sequence of the two digits 1 and 2. >> >> Is this correct behaviour? If you guys are discussing supporting all >> Unicode digits, wouldn't it make sense to support all Unicode spacing as >> well? > > I think that doing exotic things to support Unicode in *program text* > should be a much lower priority than supporting Unicode on string and > stream *data*. > Whatever the ultimate priorities of the project are, the limited support for Unicode in program text (specifically, digit-char-p) has a bug in it, which should be fixed. > If after we support a set of Unicode operations on string and > stream data, it then looks like there's a natural way for them to apply > to program code, then we can certainly think about it -- or if there's a > compelling use case for being able to use multiple different space > characters in program text. > I think that Unicode-in-source and Unicode-in-data are two relatively independent issues, and can be worked on separately. kad |