Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

[a530bb]: doc / cmucl / internals / interpreter.tex Maximize Restore History

Download this file

interpreter.tex    192 lines (152 with data), 10.1 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
% -*- Dictionary: design; Package: C -*-
May be worth having a byte-code representation for interpreted code. This way,
an entire system could be compiled into byte-code for debugging (the
"check-out" compiler?).
Given our current inclination for using a stack machine to interpret IR1, it
would be straightforward to layer a byte-code interpreter on top of this.
Interpreter:
Instead of having no interpreter, or a more-or-less conventional interpreter,
or byte-code interpreter, how about directly executing IR1?
We run through the IR1 passes, possibly skipping optional ones, until we get
through environment analysis. Then we run a post-pass that annotates IR1 with
information about where values are kept, i.e. the stack slot.
We can lazily convert functions by having FUNCTION make an interpreted function
object that holds the code (really a closure over the interpreter). The first
time that we try to call the function, we do the conversion and processing.
Also, we can easily keep track of which interpreted functions we have expanded
macros in, so that macro redefinition automatically invalidates the old
expansion, causing lazy reconversion.
Probably the interpreter will want to represent MVs by a recognizable structure
that is always heap-allocated. This way, we can punt the stack issues involved
in trying to spread MVs. So a continuation value can always be kept in a
single cell.
The compiler can have some special frobs for making the interpreter efficient,
such as a call operation that extracts arguments from the stack
slots designated by a continuation list. Perhaps
(values-mapcar fun . lists)
<==>
(values-list (mapcar fun . lists))
This would be used with MV-CALL.
This scheme seems to provide nearly all of the advantages of both the compiler
and conventional interpretation. The only significant disadvantage with
respect to a conventional interpreter is that there is the one-time overhead of
conversion, but doing this lazily should make this quite acceptable.
With respect to a conventional interpreter, we have major advantages:
+ Full syntax checking: safety comparable to compiled code.
+ Semantics similar to compiled code due to code sharing. Similar diagnostic
messages, etc. Reduction of error-prone code duplication.
+ Potential for full type checking according to declarations (would require
running IR1 optimize?)
+ Simplifies debugger interface, since interpreted code can look more like
compiled code: source paths, edit definition, etc.
For all non-run-time symbol annotations (anything other than SYMBOL-FUNCTION
and SYMBOL-VALUE), we use the compiler's global database. MACRO-FUNCTION will
use INFO, rather than vice-versa.
When doing the IR1 phases for the interpreter, we probably want to suppress
optimizations that change user-visible function calls:
-- Don't do local call conversion of any named functions (even lexical ones).
This is so that a call will appear on the stack that looks like the call in
the original source. The keyword and optional argument transformations
done by local call mangle things quite a bit. Also, note local-call
converting prevents unreferenced arguments from being deleted, which is
another non-obvious transformation.
-- Don't run source-transforms, IR1 transforms and IR1 optimizers. This way,
TRACE and BACKTRACE will show calls with the original arguments, rather
than the "optimized" form, etc. Also, for the interpreter it will
actually be faster to call the original function (which is compiled) than
to "inline expand" it. Also, this allows implementation-dependent
transforms to expand into %PRIMITIVE uses.
There are some problems with stepping, due to our non-syntactic IR1
representation. The source path information is the key that makes this
conceivable. We can skip over the stepping of a subform by quietly evaluating
nodes whose source path lies within the form being skipped.
One problem with determining what value has been returned by a form. With a
function call, it is theoretically possible to precisely determine this, since
if we complete evaluation of the arguments, then we arrive at the Combination
node whose value is synonymous with the value of the form. We can even detect
this case, since the Node-Source will be EQ to the form. And we can also
detect when we unwind out of the evaluation, since we will leave the form
without having ever reached this node.
But with macros and special-forms, there is no node whose value is the value of
the form, and no node whose source is the macro call or special form. We can
still detect when we leave the form, but we can't be sure whether this was a
normal evaluation result or an explicit RETURN-FROM.
But does this really matter? It seems that we can print the value returned (if
any), then just print the next form to step. In the rare case where we did
unwind, the user should be able to figure it out.
[We can look at this as a side-effect of CPS: there isn't any difference
between a "normal" return and a non-local one.]
[Note that in any control transfer (normal or otherwise), the stepper may need
to unwind out of an arbitrary number of levels of stepping. This is because a
form in a TR position may yield its to a node arbitrarily far our.]
Another problem is with deciding what form is being stepped. When we start
evaluating a node, we dive into code that is nested somewhere down inside that
form. So we actually have to do a loop of asking questions before we do any
evaluation. But what do we ask about?
If we ask about the outermost enclosing form that is a subform of the the last
form that the user said to execute, then we might offer a form that isn't
really evaluated, such as a LET binding list.
But once again, is this really a problem? It is certainly different from a
conventional stepper, but a pretty good argument could be made that it is
superior. Haven't you ever wanted to skip the evaluation of all the
LET bindings, but not the body? Wouldn't it be useful to be able to skip the
DO step forms?
All of this assumes that nobody ever wants to step through the guts of a
macroexpansion. This seems reasonable, since steppers are for weenies, and
weenies don't define macros (hence don't debug them). But there are probably
some weenies who don't know that they shouldn't be writing macros.
We could handle this by finding the "source paths" in the expansion of each
macro by sticking some special frob in the source path marking the place where
the expansion happened. When we hit code again that is in the source, then we
revert to the normal source path. Something along these lines might be a good
idea anyway (for compiler error messages, for example).
The source path hack isn't guaranteed to work quite so well in generated code,
though, since macros return stuff that isn't freshly consed. But we could
probably arrange to win as long as any given expansion doesn't return two EQ
forms.
It might be nice to have a command that skipped stepping of the form, but
printed the results of each outermost enclosed evaluated subform, i.e. if you
used this on the DO step-list, it would print the result of each new-value
form. I think this is implementable. I guess what you would do is print each
value delivered to a DEST whose source form is the current or an enclosing
form. Along with the value, you would print the source form for the node that
is computing the value.
The stepper can also have a "back" command that "unskips" or "unsteps". This
would allow the evaluation of forms that are pure (modulo lexical variable
setting) to be undone. This is useful, since in stepping it is common that you
skip a form that you shouldn't have, or get confused and want to restart at
some earlier point.
What we would do is remember the current node and the values of all local
variables. heap before doing each step or skip action. We can then back up
the state of all lexical variables and the "program counter". To make this
work right with set closure variables, we would copy the cell's value, rather
than the value cell itself.
[To be fair, note that this could easily be done with our current interpreter:
the stepper could copy the environment alists.]
We can't back up the "program counter" when a control transfer leaves the
current function, since this state is implicitly represented in the
interpreter's state, and is discarded when we exit. We probably want to ask
for confirmation before leaving the function to give users a chance to "unskip"
the forms in a TR position.
Another question is whether the conventional stepper is really a good thing to
imitate... How about an editor-based mouse-driven interface? Instead of
"skipping" and "stepping", you would just designate the next form that you
wanted to stop at. Instead of displaying return values, you replace the source
text with the printed representation of the value.
It would show the "program counter" by highlighting the *innermost* form that
we are about to evaluate, i.e. the source form for the node that we are stopped
at. It would probably also be useful to display the start of the form that was
used to designate the next stopping point, although I guess this could be
implied by the mouse position.
Such an interface would be a little harder to implement than a dumb stepper,
but it would be much easier to use. [It would be impossible for an evalhook
stepper to do this.]
%PRIMITIVE usage:
Note: %PRIMITIVE can only be used in compiled code. It is a trapdoor into the
compiler, not a general syntax for accessing "sub-primitives". It's main use
is in implementation-dependent compiler transforms. It saves us the effort of
defining a "phony function" (that is not really defined), and also allows
direct communication with the code generator through codegen-info arguments.
Some primitives may be exported from the VM so that %PRIMITIVE can be used to
make it explicit that an escape routine or interpreter stub is assuming an
operation is implemented by the compiler.