[Felix-language] Progress

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I am hacking around so it is possible to do incremental binding.

Erick already had this working, but the problem is he was feeding
the incremental binder with assembly instructions. This is easy,
but the problem is that there is little advantage using assmbly
instruction stream over just using the parse tree as the thing which
is cached on disk, and a serious problem with "prebound" symbols.

A prebound symbol is one where a lambda is lifted and given
a useless name, but a fixed integer index, and the reference
to it uses that index, the symbol table entry must use that
index too. This means assembly streams are not independent:
you can't have two of them with clashing pre-bound indices.

Pre-binding saves some time doing lookup, but the real reason
is that the desugaring process doesn't really have to worry abound
ensuring the scopes are quite right.

Just to back track again: the parent of a symbol is important
for TWO related reasons: first, so the contents OF that entity
can be bound in the right context, and second, so the entity
itself can be found by others, particularly its siblings.

When a symbol is prebound due to lambda lifting,
there is only one client, i.e. one use of it. So there is no
need for anyone other than that client to find it.
But the symbol still needs the right parent anyhow,
so its contents (executable code of a function usually)
is properly bound.

We could therefore probably get rid of pre-bound symbols
at some small performance cost: the only case where the
global uniqueness of the name matters is if the lift occurs
in global scope (i.e. in top level initialisation code),
and we can just use the filename to make it unique.

ANYHOW: using pre-built *symbol* tables instead saves
a lot more time, in particular, is we save the symbol tables
per file, and then actually bind each library in sequence,
so that when we bind the complete program we're actually
only binding the use program code, and re-using
the bound and unbound symbol tables and caches
for the libraries .. we may get some performance improvements
in the compilation process.

Most of the time is spent parsing then binding then optimising.
Optimising begins by discarding unused stuff. So the optimiser
would only really do the heavy duty work on the user program
and parts of the library that are actually used (this is the case now).

In other words, compiler performance should be dependent on
the user program length, meaning "hello world" will be fast
to compile, and the webserver will take a bit longer.

Note also the resource manager "garbage collection" concept
(well actually it is a "grab it if and only if you need it" concept)
should also minimise generated C++ text, speeding up
C++ compilation times too, in particular not emitting #includes
that aren't needed, but also minimising Felix stuff.

All in all the idea is simple: I am bored waiting for the test suite to run.
Hopefully it will run a bit faster by the time I'm finished.

CHANGES: I'm adding a new special kind of module DCL_root,
SYMDEF_root which has bid index 0. This is basically a module namespace with a
special property: when symbol tables are combined, the root entries
are merged. 

Both root and module now have arguments: their initialisation code.
This saves tracking that code independently. At the moment we make
an initproc called _init_ and search for it. The problem is that these
init procs will clobber each other when roots are merged.
So I will just aggregate the init code in the symbol table itself,
and after all the merging is done, then generate the init proc.

It's kind of a hack but cute: I may actually make a BBDCL_root
and put all the caches into it :)

--
john skaller
sk...@us...