Q 7.0 character escape syntax (was: Re: [q-lang-users] ANN: Q 7.0 release candidate)
Brought to you by:
agraef
From: Albert G. <Dr....@t-...> - 2006-02-16 01:26:46
|
Hi John, I'm taking this discussion back to the the mailing list, as I feel that this issue should be discussed by everybody on the list who is interested in the upcoming Q 7.0 release. (Just a quick update for everybody: Q 7.0 RC2 is almost done and I also have the native Windows port working. Moreover, thanks to John Cowan's tireless testing and bug reporting, Cygwin is now supported, too. However, there is still an issue related to numeric character escapes, as detailed below. Note that we now need an escape syntax which is able to support the entire Unicode range, not just ASCII.) John Cowan wrote: > I found a problem: when you type > > "\300" ++ "4" > > to the interpreter, it replies > > "\3004" > There doesn't seem to be any way to defeat > the greediness of the \N construct, and there needs to be. It seems > to me that the most Q-ish approach is to allow parentheses around N, > and output "\(300)4". Thanks for reporting this. I certainly want to fix this before releasing RC2. Your proposal makes sense to me, and would be fairly easy to implement, too. (NB: The problem here is that an escape like "\1234" will always denote character #1234, and there's currently no way to escape, say, character #123, followed by a literal "4" character (other than escaping "4", too, which is silly). In fact, I think that this misfeature is present in *all* recent Q versions.) > In addition, Unicode folks really really really detest decimal numbers > for Unicode characters. While you're fixing the above, please allow a > leading x (as in "\x0100") for hexadecimal character escapes. Recent Q versions already allow either decimal, octal or hexadecimal notation in an escape, using the same syntax as in integer literals. Thus, e.g., \27, \033 and \0x1b all denote the ASCII escape character. With Q 7.0 I still use the same notation, only the range of character codes is bigger, allowing for all 0x110000 Unicode characters. I think that this notation is cleaner and simpler than having to remember all kinds of funny escape notations, like the \ooo, \xhh, \uhhhh and \Uhhhhhhhh escapes of C, Python et al. But the advantage of the latter is that apparently many other languages already use them, so they are a kind of de facto standard. I'm not sure what The Right Thing is in this case. So what should we do: Keep the existing \ddd, \0ooo, \0xhhh notation and extend that with the \(<int>) notation? Or rather jump on the C/Python/... bandwagon and employ the widely used \ooo, \xhh, \uhhhh, \Uhhhhhhhh notation? (Note that then I'd also have to slash the decimal escape syntax of pre-7.0 releases, potentially breaking existing scripts in places where it might not be easily noticed.) Any other proposals? Opinions? > Another minor point: currently stray \ characters are basically ignored: > "\z" is equivalent to "z". IMHO this should be changed, making them > syntax errors; that allows you to add a meaning for \z at some future > date without worries that existing poorly-written scripts will break. This convention was adopted from C (I think that the standard doesn't actually specify this, but IIRC all C compilers I've used did it that way). At least the newer versions of gcc generate a warning message in the case of an unrecognized escape, though. I could easily do this in the Q compiler as well, but unless you run the interpreter with the -w option you won't notice the difference. ;-) OTOH, spitting out a syntax error in this case seems a bit too harsh for my taste. What do others think? Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |