From: john s. <sk...@us...> - 2010-11-22 00:52:53
|
I am hacking around so it is possible to do incremental binding. Erick already had this working, but the problem is he was feeding the incremental binder with assembly instructions. This is easy, but the problem is that there is little advantage using assmbly instruction stream over just using the parse tree as the thing which is cached on disk, and a serious problem with "prebound" symbols. A prebound symbol is one where a lambda is lifted and given a useless name, but a fixed integer index, and the reference to it uses that index, the symbol table entry must use that index too. This means assembly streams are not independent: you can't have two of them with clashing pre-bound indices. Pre-binding saves some time doing lookup, but the real reason is that the desugaring process doesn't really have to worry abound ensuring the scopes are quite right. Just to back track again: the parent of a symbol is important for TWO related reasons: first, so the contents OF that entity can be bound in the right context, and second, so the entity itself can be found by others, particularly its siblings. When a symbol is prebound due to lambda lifting, there is only one client, i.e. one use of it. So there is no need for anyone other than that client to find it. But the symbol still needs the right parent anyhow, so its contents (executable code of a function usually) is properly bound. We could therefore probably get rid of pre-bound symbols at some small performance cost: the only case where the global uniqueness of the name matters is if the lift occurs in global scope (i.e. in top level initialisation code), and we can just use the filename to make it unique. ANYHOW: using pre-built *symbol* tables instead saves a lot more time, in particular, is we save the symbol tables per file, and then actually bind each library in sequence, so that when we bind the complete program we're actually only binding the use program code, and re-using the bound and unbound symbol tables and caches for the libraries .. we may get some performance improvements in the compilation process. Most of the time is spent parsing then binding then optimising. Optimising begins by discarding unused stuff. So the optimiser would only really do the heavy duty work on the user program and parts of the library that are actually used (this is the case now). In other words, compiler performance should be dependent on the user program length, meaning "hello world" will be fast to compile, and the webserver will take a bit longer. Note also the resource manager "garbage collection" concept (well actually it is a "grab it if and only if you need it" concept) should also minimise generated C++ text, speeding up C++ compilation times too, in particular not emitting #includes that aren't needed, but also minimising Felix stuff. All in all the idea is simple: I am bored waiting for the test suite to run. Hopefully it will run a bit faster by the time I'm finished. CHANGES: I'm adding a new special kind of module DCL_root, SYMDEF_root which has bid index 0. This is basically a module namespace with a special property: when symbol tables are combined, the root entries are merged. Both root and module now have arguments: their initialisation code. This saves tracking that code independently. At the moment we make an initproc called _init_ and search for it. The problem is that these init procs will clobber each other when roots are merged. So I will just aggregate the init code in the symbol table itself, and after all the merging is done, then generate the init proc. It's kind of a hack but cute: I may actually make a BBDCL_root and put all the caches into it :) -- john skaller sk...@us... |