Thread: [q-lang-users] New stuff in cvs: multichar ops, views
Brought to you by:
agraef
From: Albert G. <Dr....@t-...> - 2007-05-30 11:59:30
|
Hi all, I had some time to work on the new Q release over the Whitsun holidays. As a result, you can find some interesting new stuff in cvs today: Multichar operator symbols: These have been asked for on the mailing list a while ago. I've given in to the popular demand now and implemented them. So you can write, e.g.: public (--) Xs Ys @(-); Xs:List--Ys:List = foldl (flip (filter . neq)) Xs Ys; (Q's lexical syntax had to be revised to support this, but most old scripts should be unaffected.) Wadler/Okasaki-style views (good stuff!): These are now implemented, too. Virtual constructors can be declared with the new 'virtual' keyword and can then be used in pattern-matching definitions like real constructors (if the appropriate views a.k.a. unparsings are defined). Note that in order to go with the customary terminology, I renamed the builtin 'unparse' routine to 'view' so you will have to change your scripts accordingly. For example, here is how the 'Rational' type and its view are now defined in the standard library: // from rational.q public type Rational : Real = virtual (%) P Q @ (/) | private const rat N D; // from prelude.q view Q:Rational = '(N % D) where (N:Int,D:Int) = num_den Q; Having (%) as a virtual constructor of 'Rational' lets you use the operator in your definitions just like a real constructor, e.g.: def X%Y = 3%14 + 7%6; foo (X%Y) = (X,Y); This yields: ==> (X,Y); foo (4%6) (29,21) (2,3) The same applies to the container ADTs in the standard library. E.g.: def set Xs = set [1..10]+set [5..12]; mymembers (set Xs) = Xs; Which yields: ==> Xs; mymembers (set ["a".."c"]+set ["A".."C"]) [1,2,3,4,5,6,7,8,9,10,11,12] ["A","B","C","a","b","c"] It goes without saying that this makes the handling of abstract data types *much* more convenient and elegant. (Note that 'def's in the interactive command loop of the interpreter don't quite work like that yet, as they don't handle virtual constructors right now; but I'm working on that.) Enjoy! :) Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: John C. <co...@cc...> - 2007-05-30 22:29:08
|
Albert Graef scripsit: > Multichar operator symbols: These have been asked for on the mailing > list a while ago. I've given in to the popular demand now and > implemented them. So you can write, e.g.: > > public (--) Xs Ys @(-); > Xs:List--Ys:List = foldl (flip (filter . neq)) Xs Ys; I'm happy with that -- I assume that "--" is distinct from "- -" where the latter is unary minus? (I always get bitten by unary minus in Q, still.) > Wadler/Okasaki-style views (good stuff!): These are now implemented, Hurrah! -- We are lost, lost. No name, no business, no Precious, nothing. Only empty. Only hungry: yes, we are hungry. A few little fishes, nassty bony little fishes, for a poor creature, and they say death. So wise they are; so just, so very just. --Gollum co...@cc... http://ccil.org/~cowan |
From: Albert G. <Dr....@t-...> - 2007-05-31 03:45:10
|
Hi John, John Cowan wrote: > I'm happy with that -- I assume that "--" is distinct from "- -" > where the latter is unary minus? Yes, sure. The unparser now also makes sure that spaces are inserted between adjacent binary and unary symbols. But note that this was just an example, (--) is not in the standard library. So unless you define (--) yourself you can still write 5--3 and get the expected result, 8. :) > (I always get bitten by unary minus in Q, still.) Care to explain why? I'd say that its usage is rather straightforward. The precedence is the same as in Pascal (IIRC), all you have to remember is that '-' in sections always denotes *binary* minus so to denote the unary minus function you have to write 'minus'. >> Wadler/Okasaki-style views (good stuff!): These are now implemented, > > Hurrah! Thought you might like that. ;-) IMHO, this is in fact the most useful addition to Q since 'unparse' (which is now called 'view', btw). I almost dropped the ball there, as this feature was, ahem, somewhat tricky to implement efficiently in the context of Q. But I perservered, and in the end I figured out how to do it. As a first exercise, I just comitted some changes which turn Complex into an ADT with the virtual constructor (:+). So the constant i now prints as 0:+1. (NB: I chose :+ for the sake of Haskell compatibility, but I actually think that it looks a bit weird. In Haskell they can't do any better because the ':' has to be on the left side to denote a constructor, IIRC. But I'd actually prefer +: for Q. Other suggestions, anyone?) (NB2: Rob, I'm sure you will like this, because you're an ADT fan and because this makes the treatment of Complex very similar to Rational, as you suggested before. We might have to fiddle with ratutils.q though, I'm not sure whether it uses the old 'complex' constructor in some places, which is not public any more. But it should be easy to fix that since the rest of the interface of Complex is unchanged.) I also have plans to turn 'lambda' into a virtual constructor for the builtin external <<Function>> type used to represent compiled lambdas, and, instead of having a builtin pretty-printing of <<Function>> objects, define an appropriate view for them. This will make it possible to write stuff like: foo (lambda X Y) = ...; even if 'foo' is not a special form and thus gets the lambda delivered in its compiled form. Right now it's necessary to play some dirty tricks with 'valq . str' to dissect a compiled lambda. Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: John C. <co...@cc...> - 2007-05-31 04:15:10
|
Albert Graef scripsit: > > (I always get bitten by unary minus in Q, still.) > > Care to explain why? I'd say that its usage is rather straightforward. > The precedence is the same as in Pascal (IIRC), all you have to remember > is that '-' in sections always denotes *binary* minus so to denote the > unary minus function you have to write 'minus'. I always forget that -2 is the application of minus to 2 rather than a constant, so I write things like "foo -2", which then turns out to be "(foo -) 2". > As a first exercise, I just comitted some changes which turn Complex > into an ADT with the virtual constructor (:+). So the constant i now > prints as 0:+1. I don't like that much. I suppose there is no way to make 0+1*i the constructor/view? I guess not, since Q is eager. -- John Cowan http://www.ccil.org/~cowan <co...@cc...> "Any legal document draws most of its meaning from context. A telegram that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in 5-bit Baudot code plus appropriate headers) is as good a legal document as any, even sans digital signature." --me |
From: Rob H. <hub...@gm...> - 2007-05-31 08:41:22
|
> > As a first exercise, I just comitted some changes which turn Complex > > into an ADT with the virtual constructor (:+). So the constant i now > > prints as 0:+1. > > I don't like that much. I suppose there is no way to make 0+1*i the > constructor/view? I guess not, since Q is eager. What about 0+i*1 if the constructor could be (+i*)? I don't know if this kind of mixed operator is allowed, nor whether it would be a good idea to do so. If mixed operators are not allowed, what about the thing that looks a little like 'i': use something like (+:*) or (+|*) or (+!*)? Alternatively, what about allowing mixed multi-character operator and function names delimited by something, such as [brackets]: 0[+i*]1. Yes, that's still pretty ugly. Rob. |
From: Albert G. <Dr....@t-...> - 2007-05-31 11:46:07
|
Rob Hubbard wrote: > If mixed operators are not allowed, [...] No, they aren't. You can either have a sequence of punctuation symbols or an identifier, but not both in the same symbol. (Actually, I could change the lexical syntax to make that possible, but I don't think that this would be a good idea.) > [...] what about the thing that looks a > little like 'i': use something like (+:*) or (+|*) or (+!*)? That was my idea with '+:'. I can easily read that as "plus i times". '+!' looks nice to me, too. ('+:' might confuse Haskell programmers as they'll easily mistype it as ':+'.) > Alternatively, what about allowing mixed multi-character operator and > function names delimited by something, such as [brackets]: 0[+i*]1. That's not possible. The parens/brackets/braces are reserved delimiters which cannot occur in an operator symbol. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2007-05-31 11:18:58
|
John Cowan wrote: > I always forget that -2 is the application of minus to 2 rather than a > constant, so I write things like "foo -2", which then turns out to be > "(foo -) 2". Well, syntactically the '-' in '-2' is a unary minus, although semantically, it's still a number and not an explicit application of minus. IMHO, that's the only reasonable way to implement it, since I want '-2' and '-X' to be parsed in the same manner. >> As a first exercise, I just comitted some changes which turn Complex >> into an ADT with the virtual constructor (:+). So the constant i now >> prints as 0:+1. > > I don't like that much. I suppose there is no way to make 0+1*i the > constructor/view? I guess not, since Q is eager. Well, X+Y*i doesn't work here since '+' is not a virtual constructor of Complex, so while you can use this for pretty-printing (as we did in Q 7.6), you can't write stuff like 'foo (X+Y*i) = ...'. So, to make matching against the view work, the head symbol of the view must be a virtual constructor of the type and that's the purpose that ':+' serves. We could have something like 'X:+Y*i' instead of just 'X:+Y' but that seems redundant. What's so bad about 'X:+Y'? I can easily read ':+' aloud as "+i times". Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Eddie R. <ed...@bm...> - 2007-05-31 13:47:57
|
Albert Graef, > John Cowan wrote: >> I always forget that -2 is the application of minus to 2 rather than a >> constant, so I write things like "foo -2", which then turns out to be >> "(foo -) 2". > > Well, syntactically the '-' in '-2' is a unary minus, although > semantically, it's still a number and not an explicit application of > minus. IMHO, that's the only reasonable way to implement it, since I > want '-2' and '-X' to be parsed in the same manner. Put this on a college Algebra or Calculus test: -2^2 = (a) -4 (b) 4 All most all of the students will put (b). This is the same gotcha that I have to point out to students all the time. Strange though, they have no problem with X=2, -X^2=-4. Sorry, I just had to toss in my 2 cents. Eddie |
From: Albert G. <Dr....@t-...> - 2007-05-31 14:21:57
|
Eddie Rucker wrote: > Put this on a college Algebra or Calculus test: > -2^2 = > (a) -4 (b) 4 > All most all of the students will put (b). This is the same gotcha that I > have to point out to students all the time. Strange though, they have no > problem with X=2, -X^2=-4. Well, at least Q does the right thing there. :) ==> -2^2 -4.0 -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Rob H. <hub...@gm...> - 2007-05-31 08:40:11
|
Hello Albert, On 31/05/07, Albert Graef <Dr....@t-...> wrote: > John Cowan wrote: > > I'm happy with that -- I assume that "--" is distinct from "- -" > > where the latter is unary minus? > > Yes, sure. The unparser now also makes sure that spaces are inserted > between adjacent binary and unary symbols. > > But note that this was just an example, (--) is not in the standard > library. So unless you define (--) yourself you can still write 5--3 and > get the expected result, 8. :) I'm very happy to see multi-character operators introduced. (Does Q also allow Unicode operators?) I presume that tokens will be delimited according to which set their constituent characters belong to: 'alphanumeric' or 'other' (although an 'alphanumeric' token must begin with an alphabetical character, of course). Is white-space a special case, i.e. a third class of character? Which characters are in the 'other' set for operators - are any (such as quote and parentheses) excluded? Does this also mean that multi-other-character function names are also supported? That is, can I now define a non-operator function called '--'? (I suppose that includes the secondary question: would 'other' characters count as lower case?) It seems that now, introducing a new symbol will affect the way that code is parsed. This is something I find a little worrying. Would it be better to have, e.g. '--' always as an atomic token, producing a normal form unless '--' is defined? That is, would is be better to break backwards compatibility? Or would that be too painful? Is the protection offered by the module system thought to be enough? [Can of worms! Sorry.] Thanks, Rob. |
From: Albert G. <Dr....@t-...> - 2007-05-31 12:37:39
Attachments:
opsyms.txt
|
Hi Rob, Rob Hubbard wrote: > I'm very happy to see multi-character operators introduced. (Does Q > also allow Unicode operators?) Yes, Unicode all the way through. :) Just like you can have arbitary Unicode letters in identifiers, you can have arbitrary Unicode punctuation in operator symbols. (BTW, I'd appreciate it very much if our non-Western-locale users could check that those Russian/Japanese/whatever identifiers and operators still work. For me, the unicode.q example works ok, but I don't have many scripts using non-ASCII characters to test, so please let me know if you find any bugs there. Alexander? Keith? Anyone else?) > I presume that tokens will be delimited according to which set their > constituent characters belong to: 'alphanumeric' or 'other' (although > an 'alphanumeric' token must begin with an alphabetical character, of > course). Is white-space a special case, i.e. a third class of > character? > > Which characters are in the 'other' set for operators - are any (such > as quote and parentheses) excluded? Ok, I've attached a little description of the lexical operator symbol syntax I wrote while working on these things, to be included in the manual later. > Does this also mean that multi-other-character function names are also > supported? That is, can I now define a non-operator function called > '--'? (I suppose that includes the secondary question: would 'other' > characters count as lower case?) No, that would make the syntax too confusing IMHO. Function symbols must now be legal identifiers, punctuation is only allowed in operator symbols. > It seems that now, introducing a new symbol will affect the way that > code is parsed. This is something I find a little worrying. You're right. Right now the lexer inspects the symbol table to partition punctuation symbols. I agree that this is a bad idea since it makes the syntax depend on the declared operator symbols. I will fix that right away. Of course this means that 5--3 won't be legal any more (unless you've declared a (--) operator). But I think that this is a minor issue, and anyway the compiler will catch it if you've written anything like that in your scripts. > Would it be better to have, e.g. '--' always as an atomic token, > producing a normal form unless '--' is defined? That is, would is be > better to break backwards compatibility? Or would that be too painful? I think that, as pointed out above, '--' should actually be an error if you haven't declared it as an operator. Implicit declaration of operators is a bad idea, IMHO. It's much too easy to mistype them. The compiler would then just silently munge almost all arbitray line noise; it might even happily parse many Perl scripts. ;-) > Is the protection offered by the module system thought to be enough? Hmm, I'm not sure what you think about here? > [Can of worms! Sorry.] No need to feel sorry, I'm glad you opened it! I want to fix all those quirks before release. ;-) Thanks, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Rob H. <hub...@gm...> - 2007-06-01 11:39:46
|
On 31/05/07, Albert Graef <Dr....@t-...> wrote: > > Rob Hubbard wrote: > >> It seems that now, introducing a new symbol will affect the way that > >> code is parsed. This is something I find a little worrying. > > > > You're right. Right now the lexer inspects the symbol table to partition > > punctuation symbols. I agree that this is a bad idea since it makes the > > syntax depend on the declared operator symbols. I will fix that right > > away. > > Well, it sounded like a good idea, but actually it isn't. ... Shame, but I agree with your decision, given the problems you described, that the breakage would after all be too severe and the resulting behaviour too inconvenient. Thanks too for all the (attached) detail about symbol parsing. I wonder whether there's any scope for Q itself to issue warnings about some or all punctuational operator declarations. This might not be too 'noisy' if only one such warning was given. Then again, perhaps this isn't such a good suggestion. Alternatively, is there any way of issuing a warning if the lexer's action is affected by the presence of an operator definition when parsing a sequence of punctuation characters to form a symbol? Again, probably not, as I can't see a good rule or heuristic to distinguish likely intended parses from unintended ones... Rob. |
From: Albert G. <Dr....@t-...> - 2007-06-01 12:06:47
|
Rob Hubbard wrote: > Alternatively, is there any way of issuing a warning if the lexer's > action is affected by the presence of an operator definition when > parsing a sequence of punctuation characters to form a symbol? Again, > probably not, as I can't see a good rule or heuristic to distinguish > likely intended parses from unintended ones... I thought about this myself, but to do this right the lexer would have to look ahead in the input and perform a considerable amount of backtracking, which would be a major performance killer. So I decided against it. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2007-06-01 11:51:12
|
Albert Graef wrote: > I also have plans to turn 'lambda' into a virtual constructor for the > builtin external <<Function>> type used to represent compiled lambdas, > and, instead of having a builtin pretty-printing of <<Function>> > objects, define an appropriate view for them. Ok, this is now implemented as well. So you can now dissect a 'Function' object (a.k.a. compiled lambda) simply as follows: ==> var fact = \N.if N>0 then N*fact(N-1) else 1; fact \X1 . if X1>0 then X1*fact (X1-1) else 1 ==> def \PAT.BODY = fact; PAT; BODY X1 if X1>0 then X1*fact (X1-1) else 1 I think that this is quite neat. I also overhauled the definition of equality on 'Function' objects in prelude.q so that it uses the corresponding view instead of comparing string representations of the objects. There's one (last?) issue I want to work on for the current release, namely the notion of syntactic equality for external types such as 'Function' which have an associated view. Right now external objects are considered syntactically equal only if they are the same object (i.e., pointer equality). AFAICS, this is the only thing that makes sense if there is no printable representation -- given that syntactical equality must always be defined, for any kinds of objects. But now that external types may have views, it makes sense to test syntactic equality on such types by comparing the corresponding views. This is consistent with the "two expressions are syntactically equal if they print out the same in the interpreter" rule for normal objects. I will also remove the current definition of (=) for Function objects, as it's just syntactic equality, and there's no real notion of semantic equality on functions which is also decidable, so it makes sense to leave (=) undefined on these objects. Is anyone fine with that? Will it break any of your existing code? Only programs directly dealing with Function objects (comparing them with (=) and (==)) might be affected. Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Alexander N. <AN...@sp...> - 2007-05-31 04:30:23
|
Hello Albert, Wednesday, May 30, 2007, 4:06:47 PM, you wrote: AG> Hi all, AG> I had some time to work on the new Q release over the Whitsun holidays. AG> As a result, you can find some interesting new stuff in cvs today: Great news. BTW, how can I build Q for windows without installing cygwin ? -- Best regards, Alexander mailto:AN...@sp... |
From: Albert G. <Dr....@t-...> - 2007-05-31 11:27:50
|
Alexander Nickolsky wrote: > Great news. BTW, how can I build Q for windows without installing > cygwin ? Not yet. :( I still have to work on the native Windows port. I'll do that as soon as RC1 is out. You could try to grab the big zip file with the Windows sources from Q 7.6, and replace the Q sources in there with the latest cvs, but I don't know how well that works after the bundled regex and glob stuff is gone... Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2007-05-31 14:17:27
|
> Rob Hubbard wrote: >> It seems that now, introducing a new symbol will affect the way that >> code is parsed. This is something I find a little worrying. > > You're right. Right now the lexer inspects the symbol table to partition > punctuation symbols. I agree that this is a bad idea since it makes the > syntax depend on the declared operator symbols. I will fix that right > away. Well, it sounded like a good idea, but actually it isn't. Applying a naive "maximal munch" rule breaks quite a lot of existing code, since code like '[0..#B-1]' then becomes a syntax error ('..#' is flagged as undefined, instead of parsing it as two lexemes '..' and '#'). Just excluding special lexemes like '..' from the maximum munch rule doesn't work either since then you couldn't define an operator like '.*' or ':+'. So I guess that we'll just have to live with the fact that if you declare an operator symbol then you're actually changing the lexical syntax of the language (which is already the case with operators like (xor) anyway, it's just not so blatantly obvious). I'll add a warning about this to the manual. Note, however, that the module system does help with stuff like this, since just adding an operator to your own script doesn't change the way that, say, the standard library modules are parsed, since your definition is not in scope there. It's just that you have to be careful with your own operator declarations. If you declare an operator like 'public (..#) X Y;' then you can't write something like '[0..#B-1]' in the scope of that definition and expect it to mean '[0 .. #B-1]'. If you do silly things like that (i.e., introduce an operator symbol which ends in something which can also be interpreted as a unary operator) then you get what you called for. ;-) Sharp knife and all that... Ok, here's the "maximal munch" rule as it is implemented right now. I actually think that it works pretty well; at least it doesn't disrupt any existing code that I've tried. MAXIMAL MUNCH RULE. Operator symbols consisting of punctuation are generally parsed using the "longest possible lexeme" a.k.a. "maximal munch" rule. More precisely, this means that in a _declaration_ like 'public (+-&%) X Y;' the symbol being declared always extends up to the closing ')' delimiter. Outside of declarations, however, the "longest possible lexeme" refers to the longest prefix of the input such that the sequence of punctuation characters actually forms a _valid_, i.e., declared or reserved, symbol. Thus, e.g., '..#' will actually be parsed as '.. #' (reserved '..' symbol followed by a '#' operator). Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2007-05-31 19:39:29
|
Albert Graef wrote: > (Note that 'def's in the interactive command loop of the interpreter > don't quite work like that yet, as they don't handle virtual > constructors right now; but I'm working on that.) Ok, that should be fixed now, too, and call-by-pattern-matching should work as well. So the following now also works from the command line, just as in scripts: ==> def (A%B,C%D) = (6%4-1,2%3+2) ==> def {X,Y|_} = {1,3..} -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |