RE: [Sml-implementers] Structure sharing and type abbreviations

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Some high-level comments:

1. I completely agree on the need for a well-defined (ie, standard,
agreed-upon) separate compilation mechanism.  A very fundamental question is
whether to insist on explicit import lists, with specified signatures for
imported modules, or to instead rely on a CM-like dependency inference
mechanism (which works in 99% of the cases, but not all).  It's easiest, as
a matter of specifying the language, to do the former, but it is much
simpler for the programmer to have the latter.

2. How much should the language be tied to the existence of an interactive
system?  CM as it is currently implemented assumes  an interactive system,
eg to set parameters.  But not all compilers are interactive, which creates
a serious problem for compatibility.  Remember that LCF-like theorem provers
use ML as their "command language".  Should it be a specific design goal of
the language to support interactive development, or can that safely be left
to each implementation?

3. Backward compatibility is an important issue, but should it be allowed to
ossify the language?  There are several significant changes that have been
proposed over the years that would not be backward compatible, but would (it
can be argued) improve the language in important respects.  Should there be
a break with the past to clean up and move forward, or should we limit
attention to only compatible changes?

4. In making extensions or revisions what problems are we trying to solve?
We all have our favorite things to hate about Standard ML (both syntactic
and semantic), but how many of these things really matter in the sense of
inhibiting code development or inhibiting evolution of the language?  If we
evolve the language, in what direction?

5. How important is ML-style type inference?  At what point would it be the
case that it's not ML any more?  Suppose the types of all lambda-abstracted
variables must be explicitly specified?  Would that be just too awful to
contemplate?  Here the issues with interactive systems come up again, since
the need for "full" type inference is much stronger in the interactive case
than (IMO, at least) for larger programs written out "by hand".  If
implementations are allowed to differ on the extent to which they support
type inference (plausible, especially if we admit both interactive and
non-interactive implementations), then what is the official "interchange"
format for programs by which we can be assured that code will be transferred
among implementations?

Some more specific comments on your suggestions:

1. Regarding withtype, should this signature

	sig datatype t = A type u = t * t end

match the signature

	sig datatype t = A withtype u = t * t end ?

I would say "yes".  Vice versa?  There are other examples of a similar
nature.  I ask this as a conceptual matter; answering in terms of the crude
hack by which withtype is currently treated in The Definition doesn't help.

2. Datatype's cannot be made transparent with seriously changing the
language.  In particular, contrary to popular opinion, making datatype's
abstract would *preclude* programs that are currently *admitted*.  The
interactions with type sharing are also very problematic.  In fact, we
implemented datatype's transparently in TILT, but realized that this was a
bad idea, and have since backed out of it.  This required introducing some
new internal mechanisms to make the implementation bearably efficient (and
to avoid the difficulties associated with Shao's equation).  I am opposed to
making datatype's be transparent.

3. We now have a type theory that supports both applicative and generative
higher-order functors in a single, uniform framework; see my web page for a
draft paper on the topic.  I would therefore expect future versions of the
language to support "true" higher-order modules with both applicative and
generative functors.  It would also be useful to have both applicative and
generative abstraction mechanisms, with datatype's being applicative, rather
than generative.  Having said that, my belief is that higher-order modules
remain an esoteric feature of little practice importance.  Indeed, in the
presence of a good separate compilation mechanism, the need for even
first-order functors is significantly reduced, as has often been observed.

4. I have no objection to dropping abstype; I don't see any particular
reason to drop local (or, for that matter, to retain it).

5. First-class modules seriously disrupt the type theory of modularity, and
buy you almost nothing but trouble.  You can, however, have modules as
first-class values, with no difficulties.  (These are different concepts;
see the aforementioned paper.)

6. Equality types are stupid and should have been dropped ages ago.  Ditto
all other half-assed forms of overloading.  Either we should have a fully
worked-out, extensible overloading system (I'm very skeptical) or drop it
entirely (a better idea, IMO).

7. First-class polymorphism raises problems for type inference.  There are
some limited uses for it, in particular for existentials, but the cost seems
very high.  I've often wished that records could have polymorphic fields
(eg, we ran into this with the foxnet signatures), but providing them seems
tantamount to admitting first-class polymorphism.  AFAIK this is a
significant breaking point in the complexity of the language, at least as
far as type inference is concerned.

8. Support for laziness is well-understood.  IMO it should be completely
segregated from datatype's, and provided as a separate language mechanism.
This was Okasaki's original proposal; IMO Wadler's variant does not work
well in practice, and only confuses issues that are entirely separable.  Not
very ML-like, it seems to me.  Specifically, we need syntax for defining a
*possibly recursive* suspended computation, and syntax that supports
"forcing" suspensions as part of pattern matching.

9. At the level of syntax, I would support fixing the case ambiguity,
eliminating clausal function definitions entirely, fixing the treatment of
"and" and "rec" in val bindings, changing the syntax of handlers, and
probably make a few other surface syntax changes.

10. I would like to add a clean treatment of hierarchical extensible tagging
as a generalization of the current exn type.  This would improve our
exception-handling mechanism, and support dynamic dispatch for those limited
cases where it is useful.  I would not support a class mechanism of the kind
previously considered as an extension to ML (but I may be in the minority on
this).

11. I would like to revamp datatype's to better harmonize them with modules
and to avoid the annoying problem of repetition of datatype declarations in
signatures and structures.  I know how to do this, and have a preliminary
proposal for it, but it is entirely incompatible with the current mechanism
(but perhaps the old one could continue to be supported for a transition
period).  The rough idea is to follow the Harper-Stone semantics, by which a
datatype is simply a compiler-implemented structure with a special form of
signature that provides hooks into the pattern compiler.  This proposal
would also admit user-implemented datatypes (aka views, or abstract value
constructors), but I am not certain that this is a good idea.

Having said all this, let me mention the most difficult, over-arching
problem: what is the means by which these (or other) modifications would be
specified and implemented in a practical compiler?  How would these ideas
(or others that we may agree upon) be made "real"?

Finally, let us not forget that the mere existence of a rich collection of
libraries is at least as important as fixing the last 10% of oddities in the
language, or eking out another 2% performance on certain specified
benchmarks.

Bob

-----Original Message-----
From: Andreas Rossberg [mailto:ros...@ps...]
Sent: Friday, September 14, 2001 1:21 PM
To: Robert Harper
Cc: sml...@li...
Subject: Re: [Sml-implementers] Structure sharing and type abbreviations

Robert Harper wrote:
> 
> Adding "withtype" to signatures seems entirely reasonable as
> well, provided that we can agree on the semantics of signature matching in
> their presence.

Now I'm puzzled, could you point out what exactly the dark corners are?
AFAICS, since it is merely a derived form with obvious rewriting rules
(as for the core), adding it should have no effect on matching semantics
(the way they are implemented in the Definition, at least).

> More generally, I would like to open discussion about more extensive
changes
> and extensions to Standard ML. What do the readers on this list think
> should be done next in the evolution of the language?

You asked for it :-) IMHO, the most important issues are fixing the
well-known quirks and incorperating the various well-understood
extensions to the module system. Second come extensions to the core type
system, ie. more powerful notions of polymorphism. Third I would put all
the anticipated future extensions of modules.

But above all and probably most pressing:

- agree on a model for separate compilation

Here is the somewhat exhaustive version of my personal wish list. Items
marked with * are incompatible changes I assume, the others are more or
less extensions.

Obviously worth adopting and understood well enough to try finalizing a
design:

- fix some syntactic annoyances (case, handle, constructor status, etc.)
*
  and ambiguities
- drop abstype and local *
- transparent datatypes (*)
- "where" for structures, and definitional structure specs (dropping
sharing *)
- withtype in signatures
- higher-order modules, including some form of applicative functors
- parameterized signatures (having modules as arguments - important IMO
if you
  take higher-order functors seriously)

Not so clear yet how, but also worth having at some point:

- dropping eqtype *
- polymorphic recursion
- first-class polymorphism (probably also existential types, most
promising
  seems incorporating both kinds of quantification into datatypes)
- polymorphic records
- recursive modules

Interesting but probably not worth the trouble:

- local modules (ie. giving up the core/module stratification)
- first-class modules
- polymorphic variants (a la OCaml but with less ambitious typing and
better
  integration with ordinary datatypes)
- alternative patterns and pattern guards
- promoting signatures to first-class citizens of the module language,
  ie. nested and abstract signatures (I know this leads to an
undecidable
  type system and a no longer strongly normalizing module language but
it
  may still be worthwhile)

Regarding standard library support I believe it is important to have:

- some support for laziness
- concurrency
- obligatory support for infinite precision integers
- the most common collection types (sets, maps, hashtables)

What I personally prefer not to have at all (looking at the ML2000
paper):

- objects and classes
- any form of subsumption in the core language
- requiring explicit type annotations to achieve basic polymorphism
- abolishing infix decs (although I could live with OCaml's approach)

Best regards,

	- Andreas

-- 
Andreas Rossberg, ros...@ps...

"Computer games don't affect kids; I mean if Pac Man affected us
 as kids, we would all be running around in darkened rooms, munching
 magic pills, and listening to repetitive electronic music."
 - Kristian Wilson, Nintendo Inc.