[q-lang-users] Algebraic types, constructors, and all that (was Re: New stuff in cvs: multichar ops

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

ed...@ri... wrote:
>     I think I'm confused about what a "constructor"
>     is. I've been trying to look up constructor but I've only found
>     "object oriented" stuff so far. Would anyone send back a definition?

Hi Eddie,

you should also take a look at "algebraic types" at Wikipedia, but I'll
try to give an informal explanation below. (Maybe this should go into
the FAQ?)

A constructor is just a function which doesn't evaluate to something
else, but simply stands for itself. (When I'm talking about "functions"
here, you might as well substitute "operators"; I'm sure you already
know that operators are just functions with some extra chocolate
coating.) So, for instance, a constant (nullary) symbol 'nil' may be
used to denote, say, an empty binary tree, while the 'bin' constructor
might be used to construct objects of the form 'bin X L R' where X is
the value at the root node of the binary tree and L and R are the left
and right subtrees.

Q (and term rewriting in general) doesn't actually distinguish between
"defined" functions (which evaluate to "something else") and
constructors (which evaluate to themselves). Any function symbol may act
as a constructor if it occurs in a normal form (completely evaluated
expression). A function symbol without any defining equations will
*always* act as a constructor; we can also call such a symbol a *pure*
constructor. Q allows you to explicitly declare a function symbol as a
pure constructor (that's what the 'const' modifier in a declaration is
for), so that you do not accidentally define it in an equation. Nobody
forces you to do this in Q (you can just happily use 'nil' and 'bin' as
constructors without ever declaring them), but if you declare them as
'const' then the compiler can verify that there are no equations for the
constructor.

This is a bit different in languages like ML and Haskell, where
constructors and defined functions are disjoint. You can apply both to a
given number of arguments, just as in Q, but they are treated
differently; defined functions always evaluate to something else, while
constructurs are always used to construct literal data objects. In Q
there is no such segregation between the two kinds of functions.

Ok, let's get to data types now. In modern FP languages like Haskell, ML
and Q there often is a single construct to create new aggregate data
types (besides the built-in list and tuple types), namely *algebraic
types*. An algebraic data type is essentially specified by a type symbol
associated with a number of constructor symbols with given arities
(numbers of arguments). The set of data objects of the type consists of
all terms of the form 'f X1 ... Xn', where f is one of the constructor
symbols of the type, n its arity, and X1 ... Xn are argument terms of
the appropriate types (as a dynamically typed language, Q places no
restrictions whatsoever on the types of the arguments, but Haskell and
ML of course do, although they allow generic types where the component
types may be variable).

Mathematically, this corresponds to the "free term algebra" of a given
"signature" of function symbols, hence the name "algebraic type". From
the programmer's POV, a data element 'f X1 ... Xn' just works like a
"record" or "structure" with component values X1 ... Xn, except that you
usually extract the component values by pattern matching rather than
with "field names". A data type may have more than one constructor, of
course, which roughly corresponds to "records with variants", where you
have the constructor as a kind of "variant selector".

NB: As already mentioned, in Q nobody forces you to declare a data type
to be able to work with constructors, whereas in Haskell and ML a data
type declaration is the only way to get your hands on them. In Q you
*can* declare a data type, however, which allows you to use the type
symbol as a "guard" on left-hand side variables of equations if you want
to make sure that only objects of the given type (i.e., created with the
appropriate constructors) match the variable.

Ok, I'm slowly getting back to the Complex type now. Just like in
conventional programming languages, where you can make a record data
type "opaque", i.e., an abstract data type (ADT), by just hiding the
record definition, you can also turn a (concrete) algebraic data type
into an ADT by hiding its constructors. In Q you can do this by
declaring the constructors 'private' while the type itself is 'public'.
This is what Rob and I have done for the Rational and Complex types
which are both algebraic types with hidden constructors, so that people
do not mess around with the internal representation.

This is all good and fine, but wouldn't it be convenient if Rational and
Complex actually exposed their constructors, so that you can do at least
pattern-matching on objects of the type? Still, for safety reasons you
want to keep the internal representation hidden. What to do?

Wadler proposed an answer to this problem in 1987 already, the so-called
"views". Q supports these since version 7.7. They allow you to have your
cake and eat it, too. That is, you can hide the internal representation
(i.e., the "real" constructors of the type), but at the same time expose
some "virtual" constructors which can be used for pattern-matching just
like the real ones, so that you don't have to fiddle around with special
access operations to get at the component values.

For the Complex type, the virtual constructor is '+:'. In your program
it works just like a real constructor, so that you can write something like:

swap (X+:Y) = Y+:X;

just like you would write, e.g.,

swap (bin X T1 T2) = bin X T2 T1;

In reality, however, (+:) is not a constructor at all, but an ordinary
function which internally uses the real (and hidden) constructor
'complex' of the Complex type to create a Complex object. Complex's view
definition (which is implemented by a corresponding equation for the
built-in 'view' function; for the standard library you can find all the
view definitions in the prelude.q script) just makes it look and feel
like a real constructor. Rob's Rational type is defined analogously
(there, the virtual constructor is the "exact division" (%) function,
which produces normalized representations of rational numbers
constructed with the hidden 'rat' constructor).

Normally, you won't ever see the real constructors, such as 'complex'
and 'rat', unless you're working in the debugger. Or you can say
'unparse off' in the interpreter which removes the rose-colored glasses,
and makes the pretty-printer an "ugly-printer" which prints the real
objects. ;-)

Ok, that was quite a long-winded explanation. :-) I could have given a
much conciser mathematical definition, but I realize that these concepts
are probably not that well-known outside of computer algebra and modern
FP circles, so I hope that I could shed some light on how this stuff
actually works. Note that I left out the bit about inheritance which is
a speciality of Q's algebraic types, because it wasn't really relevant
in this context. But you can find more about this in the "Nutshell" and
in the manual, of course.

Cheers,
Albert

-- 
Dr. Albert Gr"af
Dept. of Music-Informatics, University of Mainz, Germany
Email:  Dr....@t-..., ag...@mu...
WWW:    http://www.musikinformatik.uni-mainz.de/ag

[q-lang-users] Algebraic types, constructors, and all that (was Re: New stuff in cvs: multichar ops

[q-lang-users] Algebraic types, constructors, and all that (was Re: New stuff in cvs: multichar ops, views)