Re: [MLton] ML hack evening ideas

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Jul 10, 2017 at 4:09 AM, Matthew Fluet <mat...@gm...> wrote:
> On Mon, Jul 10, 2017 at 1:23 AM, Matt Rice <ra...@gm...> wrote:
>> On Sun, Jul 9, 2017 at 9:41 PM, Jake Zimmerman <ja...@zi...> wrote:
>>> While we're on the topic of editor/IDE support for SML:
>>>
>>>> Additional IDE (esp. Emacs) support.  For example, MLton has a
>>>> 'def-use' mode for emacs (http://mlton.org/EmacsDefUseMode), which I
>>>> use quite a bit, but I wonder if it wouldn't be better to integrate
>>>> with one of the general-purpose emacs packages for that purpose (e.g.,
>>>> ctags, xref).  Similarly, I'd love to get some kind of completion
>>>> support, presumably using the emacs company-mode package.
>>>
>>> I actually recently ported the Emacs def-use mode to a Vim plugin
>>> (https://github.com/jez/vim-better-sml). One thing that's nice about the way
>>> MLton works now is that the def-use information is editor agnostic: it just
>>> dumps the information in an easy-to-consume format. This means integrating
>>> with
>>> other editors is straightforward.
>>>
>>>> One issue with IDE support for SML is that it is often difficult
>>>> (if not impossible) to know the context in which a .sml file is meant
>>>> to be used; it is implicit in the containing .mlb or .cm file.  So,
>>>> unlike most languages, where upon encountering an identifier that is
>>>> not bound in the file, one jumps to the top of the file to look at the
>>>> #import or include directives, one needs to do more work with SML
>>>> files.
>>>
>>> I'd argue that at least the output of the def-use information makes this
>>> point
>>> not so important. For those unfamiliar, here's a sampling from a def-use
>>> file:
>>>
>>>     variable def1 /filename/foo.sml 1.1
>>>         /filename/foo.sml 2.1
>>>         /filename/foo.sml 3.1
>>>         /filename/foo.sml 4.1
>>>     variable def2 /filename/foo.sml 5.1
>>>         /filename/foo.sml 6.1
>>>         /filename/foo.sml 7.1
>>>     variable def3 /filename/foo.sml 8.1
>>>     variable def4 /filename/foo.sml 9.1
>>>         /filename/foo.sml 10.1
>>>     variable def5 /filename/foo.sml 11.1
>>>
>>> So while it's hard to know things on the granularity of a file, it's easy on
>>> the granularity of an identifier.
>>>
>>> My biggest concern with IDE tooling built around MLton right now is just how
>>> long it takes. Since every re-loads the entire basis, even a simple
>>> hello-world
>>> program takes 6+ seconds on my low-spec laptop, and gets worse for longer
>>> programs.
>>>
>>> I know that MLton has an non-goal of separate compilation, but some form of
>>> staged compilation or a server daemon that re-compiled specific files might
>>> make it easy to work with with. For example, a daemon which watched for file
>>> changes, selectively rebuilt those files, then re-output the def-use files.
>>> Even if the granularity of the caching was on the order of "Basis/non-Basis"
>>> I
>>> imagine this would significantly increase responsiveness.
>>>
>>> Just a few of my thoughts; I'm in favor of any better editor tooling for
>>> SML!
>>> Jake Z.
>>
>> As someone who likes capability based operating systems, (which tend
>> to lack global access to filesystem) the inherent access to
>> filesystems required for #import, and #include directives are a pain,
>> since it tends to limit compilers to running under e.g. posix
>> emulation.
>>
>> the python language, or at least its predecessor ABC, was initially
>> written for use in the OS Amoeba, which can somewhat be seen in its
>> module system, e.g. the "import" statement doesn't exactly work with
>> filesystem paths, but module names (though perhaps module names are
>> interpreted as filesystem paths)...
>>
>> My thoughts on the subject of enabling import/use without resorting to
>> encoding filesystem paths in the use/import directive is that the
>> source file contents themselves should specify its module name.
>> (Which could then be obtained from a lazy/partial parse).
>>
>> use/import/include could then import from this dictionary of exported
>> module names.
>>
>> in short: It would be nice if this ".smlb" declared/found module
>> identities using in-band signaling, rather than any out-of-band
>> signaling like filesystem paths.
>
> The ML Basis System certainly commits to using file names to reference
> SML and MLB files, with the usual quirks (awkwardness for file names
> with odd characters, difficulties with Windows-style paths).
>
> However,
> it was a deliberate choice to not have the SML code implicitly define
> the name of its containing MLB file.

I'm not sure I was terribly clear,
The specific thing wasn't SML code implicitly defining the name of its
containing MLB file, but it's own name (the name referenced in the MLB
file containing it)  E.g. if a file "foo.sml" is referenced as foo.sml
in an MLB file,
I'll generally have a record at the top containing:
val moduleInfo = {filename="foo.sml", author="matt", license=GPLv2OrLater};

by only providing the use/include/import portion of the information
and relying on the filesystem to interpret filenames. it is not
possible to rebuild the dependency graph, from the file contents
alone.

If/when integrating sml and mlb into smlb it'd be nice to have
dependency graph information parity in-band in some parsable form, for
what currently only exists embedded in the filesystem, and in ad-hoc
comments.

perhaps it is possible via the "bas", and "basis", I haven't delved
into the MLB format very much, because cm2mlb has sufficed for all my
needs and i've never needed to look under the hood yet.

>  We certainly valued the ability
> in SML to define multiple module-level entities (e.g., multiple
> signatures, a signature and a functor, etc.) in the same .sml file.
> We also built into the ML Basis System some conveniences for renaming
> module-level entities without dropping into SML, which is necessary to
> handle distinct packages/libraries wanting to use the same name for a
> module-level entity.

> That's always been my concern with some kind of
> global namespace: it encourages a "squatting" mentality where the
> first library to grab the name wins.

Right, I agree it is a concern, my main thoughts on the above have
been to hash the public facing ABI of the sources, and assign the
package-specified name as a default-proposed name, for a petname
system https://en.wikipedia.org/wiki/Petname wherein a name is just an
alias for a cryptographic hash.

the 'default' name proposed by the module itself implicitly refers to the hash,
and to rename, or create a new alias, you would need to calculate the
hash and assign it to a different name.

by doing so the global hash, kind of takes the place of relative path
interpretation.  But its a bit of added pain to using a non-default
name in the face of changing ABI's.

But it also does seem very much overkill for figuring out a dependency graph...
I really like how SML sources exist separate from the external state
of the filesystem (with the exception of the use function), and was
somewhat worried that the .smlb proposal would intermingle filesystem
state with source file contents.

So some way to get good cross-source navigation without resorting to
directly entangling the filesystem, is somewhere on my list of things
that would be nice to have.  Sorry for the kind of long tangent :D

Re: [MLton] ML hack evening ideas

A whole-program optimizing compiler for Standard ML

Re: [MLton] ML hack evening ideas