Diff of /doc/cmucl/internals/interpreter.tex [000000] .. [a530bb]  Maximize  Restore

Switch to side-by-side view

--- a
+++ b/doc/cmucl/internals/interpreter.tex
@@ -0,0 +1,191 @@
+%  					-*- Dictionary: design; Package: C -*-
+May be worth having a byte-code representation for interpreted code.  This way,
+an entire system could be compiled into byte-code for debugging (the
+"check-out" compiler?).
+Given our current inclination for using a stack machine to interpret IR1, it
+would be straightforward to layer a byte-code interpreter on top of this.
+Instead of having no interpreter, or a more-or-less conventional interpreter,
+or byte-code interpreter, how about directly executing IR1?
+We run through the IR1 passes, possibly skipping optional ones, until we get
+through environment analysis.  Then we run a post-pass that annotates IR1 with
+information about where values are kept, i.e. the stack slot.
+We can lazily convert functions by having FUNCTION make an interpreted function
+object that holds the code (really a closure over the interpreter).  The first
+time that we try to call the function, we do the conversion and processing.
+Also, we can easily keep track of which interpreted functions we have expanded
+macros in, so that macro redefinition automatically invalidates the old
+expansion, causing lazy reconversion.
+Probably the interpreter will want to represent MVs by a recognizable structure
+that is always heap-allocated.  This way, we can punt the stack issues involved
+in trying to spread MVs.  So a continuation value can always be kept in a
+single cell.
+The compiler can have some special frobs for making the interpreter efficient,
+such as a call operation that extracts arguments from the stack
+slots designated by a continuation list.  Perhaps 
+    (values-mapcar fun . lists)
+    (values-list (mapcar fun . lists))
+This would be used with MV-CALL.
+This scheme seems to provide nearly all of the advantages of both the compiler
+and conventional interpretation.  The only significant disadvantage with
+respect to a conventional interpreter is that there is the one-time overhead of
+conversion, but doing this lazily should make this quite acceptable.
+With respect to a conventional interpreter, we have major advantages:
+ + Full syntax checking: safety comparable to compiled code.
+ + Semantics similar to compiled code due to code sharing.  Similar diagnostic
+   messages, etc.  Reduction of error-prone code duplication.
+ + Potential for full type checking according to declarations (would require
+   running IR1 optimize?)
+ + Simplifies debugger interface, since interpreted code can look more like
+   compiled code: source paths, edit definition, etc.
+For all non-run-time symbol annotations (anything other than SYMBOL-FUNCTION
+and SYMBOL-VALUE), we use the compiler's global database.  MACRO-FUNCTION will
+use INFO, rather than vice-versa.
+When doing the IR1 phases for the interpreter, we probably want to suppress
+optimizations that change user-visible function calls:
+ -- Don't do local call conversion of any named functions (even lexical ones).
+    This is so that a call will appear on the stack that looks like the call in
+    the original source.  The keyword and optional argument transformations
+    done by local call mangle things quite a bit.  Also, note local-call
+    converting prevents unreferenced arguments from being deleted, which is
+    another non-obvious transformation.
+ -- Don't run source-transforms, IR1 transforms and IR1 optimizers.  This way,
+    TRACE and BACKTRACE will show calls with the original arguments, rather
+    than the "optimized" form, etc.  Also, for the interpreter it will
+    actually be faster to call the original function (which is compiled) than
+    to "inline expand" it.  Also, this allows implementation-dependent
+    transforms to expand into %PRIMITIVE uses.
+There are some problems with stepping, due to our non-syntactic IR1
+representation.  The source path information is the key that makes this
+conceivable.  We can skip over the stepping of a subform by quietly evaluating
+nodes whose source path lies within the form being skipped.
+One problem with determining what value has been returned by a form.  With a
+function call, it is theoretically possible to precisely determine this, since
+if we complete evaluation of the arguments, then we arrive at the Combination
+node whose value is synonymous with the value of the form.  We can even detect
+this case, since the Node-Source will be EQ to the form.  And we can also
+detect when we unwind out of the evaluation, since we will leave the form
+without having ever reached this node.
+But with macros and special-forms, there is no node whose value is the value of
+the form, and no node whose source is the macro call or special form.  We can
+still detect when we leave the form, but we can't be sure whether this was a
+normal evaluation result or an explicit RETURN-FROM.  
+But does this really matter?  It seems that we can print the value returned (if
+any), then just print the next form to step.  In the rare case where we did
+unwind, the user should be able to figure it out.  
+[We can look at this as a side-effect of CPS: there isn't any difference
+between a "normal" return and a non-local one.]
+[Note that in any control transfer (normal or otherwise), the stepper may need
+to unwind out of an arbitrary number of levels of stepping.  This is because a
+form in a TR position may yield its to a node arbitrarily far our.]
+Another problem is with deciding what form is being stepped.  When we start
+evaluating a node, we dive into code that is nested somewhere down inside that
+form.  So we actually have to do a loop of asking questions before we do any
+evaluation.  But what do we ask about?
+If we ask about the outermost enclosing form that is a subform of the the last
+form that the user said to execute, then we might offer a form that isn't
+really evaluated, such as a LET binding list.  
+But once again, is this really a problem?  It is certainly different from a
+conventional stepper, but a pretty good argument could be made that it is
+superior.  Haven't you ever wanted to skip the evaluation of all the
+LET bindings, but not the body?  Wouldn't it be useful to be able to skip the
+DO step forms?
+All of this assumes that nobody ever wants to step through the guts of a
+macroexpansion.  This seems reasonable, since steppers are for weenies, and
+weenies don't define macros (hence don't debug them).  But there are probably
+some weenies who don't know that they shouldn't be writing macros.
+We could handle this by finding the "source paths" in the expansion of each
+macro by sticking some special frob in the source path marking the place where
+the expansion happened.  When we hit code again that is in the source, then we
+revert to the normal source path.  Something along these lines might be a good
+idea anyway (for compiler error messages, for example).  
+The source path hack isn't guaranteed to work quite so well in generated code,
+though, since macros return stuff that isn't freshly consed.  But we could
+probably arrange to win as long as any given expansion doesn't return two EQ
+It might be nice to have a command that skipped stepping of the form, but
+printed the results of each outermost enclosed evaluated subform, i.e. if you
+used this on the DO step-list, it would print the result of each new-value
+form.  I think this is implementable.  I guess what you would do is print each
+value delivered to a DEST whose source form is the current or an enclosing
+form.  Along with the value, you would print the source form for the node that
+is computing the value.
+The stepper can also have a "back" command that "unskips" or "unsteps".  This
+would allow the evaluation of forms that are pure (modulo lexical variable
+setting) to be undone.  This is useful, since in stepping it is common that you
+skip a form that you shouldn't have, or get confused and want to restart at
+some earlier point.
+What we would do is remember the current node and the values of all local
+variables.  heap before doing each step or skip action.  We can then back up
+the state of all lexical variables and the "program counter".  To make this
+work right with set closure variables, we would copy the cell's value, rather
+than the value cell itself.
+[To be fair, note that this could easily be done with our current interpreter:
+the stepper could copy the environment alists.]
+We can't back up the "program counter" when a control transfer leaves the
+current function, since this state is implicitly represented in the
+interpreter's state, and is discarded when we exit.  We probably want to ask
+for confirmation before leaving the function to give users a chance to "unskip"
+the forms in a TR position.
+Another question is whether the conventional stepper is really a good thing to
+imitate...  How about an editor-based mouse-driven interface?  Instead of
+"skipping" and "stepping", you would just designate the next form that you
+wanted to stop at.  Instead of displaying return values, you replace the source
+text with the printed representation of the value.
+It would show the "program counter" by highlighting the *innermost* form that
+we are about to evaluate, i.e. the source form for the node that we are stopped
+at.  It would probably also be useful to display the start of the form that was
+used to designate the next stopping point, although I guess this could be
+implied by the mouse position.
+Such an interface would be a little harder to implement than a dumb stepper,
+but it would be much easier to use.  [It would be impossible for an evalhook
+stepper to do this.]
+%PRIMITIVE usage:
+Note: %PRIMITIVE can only be used in compiled code.  It is a trapdoor into the
+compiler, not a general syntax for accessing "sub-primitives".  It's main use
+is in implementation-dependent compiler transforms.  It saves us the effort of
+defining a "phony function" (that is not really defined), and also allows
+direct communication with the code generator through codegen-info arguments.
+Some primitives may be exported from the VM so that %PRIMITIVE can be used to
+make it explicit that an escape routine or interpreter stub is assuming an
+operation is implemented by the compiler.