Re: [Sml-implementers] Structure sharing and type abbreviations

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Objectives and procedures of language evolution seem to be a very common
problem. Looking at how other language camps cope with it might probably
give some guidance (in a positive or negative way ;-).

IMHO, valid goals for changing/extending the language include:

- Fixing obvious quirks
  (eg. syntactic or semantic ambiguities)
- Reducing the complexity of the language by removing
  obsolete/unused features
  (eg. abstype)
- Making the language safer, ie. removing common sources of
  programming errors or encouraging the use of features that
  help catching errors early on
  (eg. convenient syntax for annotating function types)
- Increasing convenience of common programming techniques
  (eg. laziness)
- Increasing expressiveness where that proved to be insufficient
  (eg. polymorphic recursion)
- Extending the range of possible application domains
  (eg. concurrency)
- Removing hurdles for portabiliy
  (eg. seperate compilation)
- Enabling simpler or more efficient implementation
  (eg. wraparound arithmetics)

Backward compatibility is probably the most fundamental problem. In my
opinion, incompatible changes cannot always be avoided if you want to
have a reasonably clean language in the end. If possible in any way,
there should be transition paths for users, however. The standard way of
implementing this seems to keep old features as deprecated (raising
warnings) for some reasonable amount of time. If that is not enough,
implementations can still support an SML'97 switch, preferably in a way
that allows interoperability with new code.

The steps for a language modification to `become real' could be that
some group of people

1. comes up with a design for a particular extension,
2. makes a proof-of-concept implementation in one of
   the existing compilers,
3. writes a proposal in form of necessary
   modifications to the Definition,
4. lets the Definition's authors promote the proposal
   to the status of a `blessed addenda'.

Something similar has recently been discussed for Haskell. Ideally,
every step should be followed by discussion in a wider forum. Arguably,
many of the more obvious extensions (eg. withtype, where structure) have
already passed stage 2.

The main problem that I understand you are hinting at is that some of
the more ambitious extensions no longer fit into the current framework
of the Definition and it is unavoidable that at one point the whole
Definition and accumulated addendas have to be substituted by a more
decent, type-theoretic specification. Since this may be a lot of work
that takes a considerable amount of time I think it would be reasonable
to adopt simpler and obvious changes in the current framework at first.
Redesign of the language specification may also be the right moment to
make most incompatible changes.

Some specific points:

> A very fundamental question is
> whether to insist on explicit import lists, with specified signatures for
> imported modules, or to instead rely on a CM-like dependency inference
> mechanism (which works in 99% of the cases, but not all).

There is also room for a middle course, namely requiring the programmer
to specify what is imported but no explicit signatures. This has the
advantage of making units more readable (explicit binders for all
identifiers) and enabling unambiguous dependency analysis while not
being overly inconvenient for the programmer.

> Should it be a specific design goal of
> the language to support interactive development, or can that safely be left
> to each implementation?

The language semantics should be designed in a way that does not
preclude interactive environments, but IMO it is not necessary to
specify the details of how such an environment actually could look like.
If we assume that all `persistent' and thus potentially interchanged
source code is written in the separate compilation model then an
interactive environment just needs to be able to import such sources in
some way.

> How important is ML-style type inference?

IMHO, very important. Personally, I would not like to see any major
compromises with respect to this - I already strongly dislike the
inconvenience of annotations forced by overloading and records. But if
there are features that require annotations then the rules for providing
these should at least be very intuitive and straight-forward (ie. local
type inference or something similar is not an option).

> If implementations are allowed to differ on the extent to which they support
> type inference (plausible, especially if we admit both interactive and
> non-interactive implementations), then what is the official "interchange"
> format for programs by which we can be assured that code will be transferred
> among implementations?

Having corners of `implementation-defined behaviour' - that's how the C
world calls it - is IMHO a very bad idea (SML'97 already contains some
wrt the context used to resolve overloading or record typing). And it
would seriously complicate matters for all sides - users and
implementers - if the `interchange format' is not ordinary source code.

> Datatype's cannot be made transparent with seriously changing the
> language.  In particular, contrary to popular opinion, making datatype's
> abstract would *preclude* programs that are currently *admitted*.

I am aware of that (and marked that point as incompatible). However, I
believe changing it would only break a rather small number of programs.
Moreover, the problems related to sharing seem likely to disappear if we
move to using "where" exclusively. One advantage of transparent
datatypes is that they make typed programming in a distributed
environment easier - processes do not need to share their type
declarations if they are not generative.

> Either we should have a fully
> worked-out, extensible overloading system (I'm very skeptical) or drop it
> entirely (a better idea, IMO).

Working with OCaml from time to time, where the latter is the case, I
have to say that it sometimes is a nuisance. Taking into account the
rich set of numeric types the Standard Basis provides I do not really
see how it could be handled without some form of overloading. Removing
generic equality would make lack of overloading even more problematic
(note that OCaml not only has polymorphic equality but also polymorphic
ordering to escape this).

> First-class polymorphism raises problems for type inference.

A simple solution might be not to introduce arbitrary rank-2 types but
to take the same approach as for recursive types and tie first-class
polymorphism to datatypes (as suggested by Mark Jones and others and
implemented in Haskell systems). This way, any use of first-class
polymorphism is marked by the occurance of a corresponding constructor
which serves as an implicit type annotation. The only language
constructs requiring modifications to their typing rules are constructor
application and matches, type inference still works as expected.

> At the level of syntax, I would support fixing the case ambiguity,
> eliminating clausal function definitions entirely,

Wow, please, no! In my average ML code clausal function definitions are
probably the single most frequently used construct besides application!
OTOH, I would strongly plead for a more accurate specification of their
syntax...

> fixing the treatment of
> "and" and "rec" in val bindings,

...or probably removing all non-recursive uses of "and" altogether.

> I would like to add a clean treatment of hierarchical extensible tagging
> as a generalization of the current exn type.

That would be great. Please also consider enabling programmers to
introduce their own extensible types. They could possibly subsume some
of the expressiveness of objects.

> I would like to revamp datatype's to better harmonize them with modules
> and to avoid the annoying problem of repetition of datatype declarations in
> signatures and structures.  I know how to do this, and have a preliminary
> proposal for it, but it is entirely incompatible with the current mechanism
> (but perhaps the old one could continue to be supported for a transition
> period).  The rough idea is to follow the Harper-Stone semantics, by which a
> datatype is simply a compiler-implemented structure with a special form of
> signature that provides hooks into the pattern compiler.  This proposal
> would also admit user-implemented datatypes (aka views, or abstract value
> constructors), but I am not certain that this is a good idea.

That sounds very interesting. Actually, IMO abstract views are one of
the features that ML modules are seriously lacking (one thing I forgot
on my little list :-). To me they seem absolutely essential to avoid the
fundamental abstraction vs. convenience conflicts in designing
interfaces.

Finally let me ask how the ML2000 effort relates to all of this. May we
conclude that you consider it more or less dead by now?

Best regards,

	- Andreas

-- 
Andreas Rossberg, ros...@ps...

"Computer games don't affect kids; I mean if Pac Man affected us
 as kids, we would all be running around in darkened rooms, munching
 magic pills, and listening to repetitive electronic music."
 - Kristian Wilson, Nintendo Inc.