RE: [Sml-implementers] extensions to SML

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

It's good to see such an animated discussion about possible revision and
extension of Standard ML.  What quickly emerges from the discussion is the
need to separate concerns.  Before considering these, let me say that I
don't like to see various efforts described as "valid" or "invalid", or for
them to be divided into "camps".  These descriptions are needlessly
antagonistic to our common goals.

Broadly speaking, we can consider two distinct enterprises:

1. Revision and extension of Standard ML to better support current and
readily foreseen needs.  The emphasis here is on maintaining continuity and
enhancing utility, rather than experimentation and exploration.

2. Drawing on experience with ML to explore new territory in language design
and implementation.  The emphasis here is on developing the "next great
language" in the ML lineage, with diminished continuity and compatibility
constraints.

Personally, I've put effort into both enterprises, which led to the 1997
revision of Standard ML and also to research on new languages and
implementations.

What is clear to me is that it is more than time to renew emphasis on
revision and extension of Standard ML, for several reasons.  First, there is
a crying need, as is clearly evidenced by this discussion.  Second, many of
the proposed revisions are entirely do-able within a reasonable time frame.
Third, it is essential for Standard ML to remain a viable language.

I propose, therefore, that we focus our efforts on (1), leaving aside
discussion of more ambitious projects (which I, among many, wish to
undertake) for another day.  I take it from the discussion that this will
not be a controversial point.

Reviewing the discussion, I can see a few major topics for immediate
consideration.

1. Standardization of a separate compilation mechanism.  This would entail
defining what are compilation units, including how imports and exports are
described, and what is the semantics of linking.  At a high level this
should not be exceptionally hard to work out, but the devil is in the
details.  For example, since interfaces for compilation units cannot be
accurately expressed in the language, there is a fundamental distinction
between "true" separate compilation (with specified interfaces for imports)
and incremental recompilation (with inferred signatures obtained by
scheduling the elaboration of units in dependency order).

2. Standardization of a substantial set of libraries.  Quite obviously this
goes far beyond the meager standard basis library that we currently share.
There are lots of hard problems to be solved here, but given the substantial
code base already in place, I'm sure we can formulate a reasonable plan.  We
would need to consider how libraries interact with separate compilation (the
work on CM is highly relevant here), formulating some interface standards
(we all probably have our own to contribute), and choice of which libraries
to include (the more the merrier, but some harmonization would be required).

3. Standardization of a foreign function interface.  I have not thought very
much about this issue, but I can certainly see difficulties with
compatibility not only across implementations (the various compilers out
there), but also across platforms (primarily, Unix vs Windows).  Fundamental
issues such as the semantics of "int" will arise, since implementations
differ, even on the same hardware and software platform, and since external
code will impose its own requirements.  The need for an FFI is largely
pragmatic --- we cannot re-invent the world ourselves in our own way.
However, as someone pointed out, buying into foreign code will certainly
limit portability across platforms and quite possibly across
implementations.

4. Extensions and modifications to the language itself.  These include
relatively trivial things like denigrating obsolete mechanisms (such as
abstype), re-considering the semantics of structure sharing (which started
this discussion), and adding support for new features (eg, updateable
records, lazy evaluation, vector expressions, richer patterns, hierarchical
extensible sums).  I think there are strong arguments for all of these
changes.  We might also consider ways to improve the syntax while providing
a path for porting old code.

I propose that we confine ourselves to these four categories for immediate
discussion.  (If I've overlooked something, I hope we can quickly agree on
what that is and whether to consider it now.)  It might make sense to form
sub-groups who take charge of specific topics, and report back to the full
group with their proposal.  Once a solid, but informal, proposal is in
place, we can evaluate it by examining its semantics and its implications
for implementation.  Presumably this will lead to revision, but will also
lead rather quickly to a solid revision or extension.

My experience has been that even very modest revisions are very hard to
make.  One reason is that we all have a very substantial commitment to the
language (in the abstract), its semantics, and its implementation.  It's a
tribute to the language that we all have such passionate views about it, and
have contributed so much of our time and energy to it.  It can also be an
obstacle to consensus.  Perhaps it is worthwhile to state a few principles
that I hope can guide us.

1. Standard ML exists independently of its implementations.  The language
should continue to have a formal definition to which implementations agree
to conform.

2. Revisions must be guided as much by the experience of users and
implementors as by the demands of a clean formal definition.  IMO the 1997
revision was hobbled by an excessive emphasis on the needs of The Definition
without due consideration of implementation or application.

3. It is important to achieve a rough consensus, but complete agreement on
all issues may be impossible to achieve.  We will need to have a mechanism
for reaching a decision in the face of disagreement.

Let the discussion begin!

Bob Harper

PS: I, among many, have ideas about new language designs that would take us
beyond the charter outlined above.  It might make sense, if there is
interest, to fork off a separate discussion of these issues.  For example, I
would consider the discussion about automatic generation of equality
functions to fall within this category, as would the proposal I mentioned
for re-working datatypes.  (In fact these fit together nicely.)