[q-lang-users] Algebraic types, constructors, and all that (was Re: New stuff in cvs: multichar ops
Brought to you by:
agraef
From: Albert G. <Dr....@t-...> - 2007-06-29 22:05:22
|
ed...@ri... wrote: > I think I'm confused about what a "constructor" > is. I've been trying to look up constructor but I've only found > "object oriented" stuff so far. Would anyone send back a definition? Hi Eddie, you should also take a look at "algebraic types" at Wikipedia, but I'll try to give an informal explanation below. (Maybe this should go into the FAQ?) A constructor is just a function which doesn't evaluate to something else, but simply stands for itself. (When I'm talking about "functions" here, you might as well substitute "operators"; I'm sure you already know that operators are just functions with some extra chocolate coating.) So, for instance, a constant (nullary) symbol 'nil' may be used to denote, say, an empty binary tree, while the 'bin' constructor might be used to construct objects of the form 'bin X L R' where X is the value at the root node of the binary tree and L and R are the left and right subtrees. Q (and term rewriting in general) doesn't actually distinguish between "defined" functions (which evaluate to "something else") and constructors (which evaluate to themselves). Any function symbol may act as a constructor if it occurs in a normal form (completely evaluated expression). A function symbol without any defining equations will *always* act as a constructor; we can also call such a symbol a *pure* constructor. Q allows you to explicitly declare a function symbol as a pure constructor (that's what the 'const' modifier in a declaration is for), so that you do not accidentally define it in an equation. Nobody forces you to do this in Q (you can just happily use 'nil' and 'bin' as constructors without ever declaring them), but if you declare them as 'const' then the compiler can verify that there are no equations for the constructor. This is a bit different in languages like ML and Haskell, where constructors and defined functions are disjoint. You can apply both to a given number of arguments, just as in Q, but they are treated differently; defined functions always evaluate to something else, while constructurs are always used to construct literal data objects. In Q there is no such segregation between the two kinds of functions. Ok, let's get to data types now. In modern FP languages like Haskell, ML and Q there often is a single construct to create new aggregate data types (besides the built-in list and tuple types), namely *algebraic types*. An algebraic data type is essentially specified by a type symbol associated with a number of constructor symbols with given arities (numbers of arguments). The set of data objects of the type consists of all terms of the form 'f X1 ... Xn', where f is one of the constructor symbols of the type, n its arity, and X1 ... Xn are argument terms of the appropriate types (as a dynamically typed language, Q places no restrictions whatsoever on the types of the arguments, but Haskell and ML of course do, although they allow generic types where the component types may be variable). Mathematically, this corresponds to the "free term algebra" of a given "signature" of function symbols, hence the name "algebraic type". From the programmer's POV, a data element 'f X1 ... Xn' just works like a "record" or "structure" with component values X1 ... Xn, except that you usually extract the component values by pattern matching rather than with "field names". A data type may have more than one constructor, of course, which roughly corresponds to "records with variants", where you have the constructor as a kind of "variant selector". NB: As already mentioned, in Q nobody forces you to declare a data type to be able to work with constructors, whereas in Haskell and ML a data type declaration is the only way to get your hands on them. In Q you *can* declare a data type, however, which allows you to use the type symbol as a "guard" on left-hand side variables of equations if you want to make sure that only objects of the given type (i.e., created with the appropriate constructors) match the variable. Ok, I'm slowly getting back to the Complex type now. Just like in conventional programming languages, where you can make a record data type "opaque", i.e., an abstract data type (ADT), by just hiding the record definition, you can also turn a (concrete) algebraic data type into an ADT by hiding its constructors. In Q you can do this by declaring the constructors 'private' while the type itself is 'public'. This is what Rob and I have done for the Rational and Complex types which are both algebraic types with hidden constructors, so that people do not mess around with the internal representation. This is all good and fine, but wouldn't it be convenient if Rational and Complex actually exposed their constructors, so that you can do at least pattern-matching on objects of the type? Still, for safety reasons you want to keep the internal representation hidden. What to do? Wadler proposed an answer to this problem in 1987 already, the so-called "views". Q supports these since version 7.7. They allow you to have your cake and eat it, too. That is, you can hide the internal representation (i.e., the "real" constructors of the type), but at the same time expose some "virtual" constructors which can be used for pattern-matching just like the real ones, so that you don't have to fiddle around with special access operations to get at the component values. For the Complex type, the virtual constructor is '+:'. In your program it works just like a real constructor, so that you can write something like: swap (X+:Y) = Y+:X; just like you would write, e.g., swap (bin X T1 T2) = bin X T2 T1; In reality, however, (+:) is not a constructor at all, but an ordinary function which internally uses the real (and hidden) constructor 'complex' of the Complex type to create a Complex object. Complex's view definition (which is implemented by a corresponding equation for the built-in 'view' function; for the standard library you can find all the view definitions in the prelude.q script) just makes it look and feel like a real constructor. Rob's Rational type is defined analogously (there, the virtual constructor is the "exact division" (%) function, which produces normalized representations of rational numbers constructed with the hidden 'rat' constructor). Normally, you won't ever see the real constructors, such as 'complex' and 'rat', unless you're working in the debugger. Or you can say 'unparse off' in the interpreter which removes the rose-colored glasses, and makes the pretty-printer an "ugly-printer" which prints the real objects. ;-) Ok, that was quite a long-winded explanation. :-) I could have given a much conciser mathematical definition, but I realize that these concepts are probably not that well-known outside of computer algebra and modern FP circles, so I hope that I could shed some light on how this stuff actually works. Note that I left out the bit about inheritance which is a speciality of Q's algebraic types, because it wasn't really relevant in this context. But you can find more about this in the "Nutshell" and in the manual, of course. Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |