Diff of /doc/cmucl/internals/front.tex [000000] .. [a530bb] Maximize Restore

  Switch to side-by-side view

--- a
+++ b/doc/cmucl/internals/front.tex
@@ -0,0 +1,943 @@
+\chapter{ICR conversion} % -*- Dictionary: design -*-
+\section{Canonical forms}
+Would be useful to have a Freeze-Type proclamation.  Its primary use would to
+be say that the indicated type won't acquire any new subtypes in the future.
+This allows better open-coding of structure type predicates, since the possible
+types that would satisfy the predicate will be constant at compile time, and
+thus can be compiled as a skip-chain of EQ tests.  
+Of course, this is only a big win when the subtypes are few: the most important
+case is when there are none.  If the closure of the subtypes is much larger
+than the average number of supertypes of an inferior, then it is better to grab
+the list of superiors out of the object's type, and test for membership in that
+Should type-specific numeric equality be done by EQL rather than =?  i.e.
+should = on two fixnums become EQL and then convert to EQL/FIXNUM?
+Currently we transform EQL into =, which is complicated, since we have to prove
+the operands are the class of numeric type before we do it.  Also, when EQL
+sees one operand is a FIXNUM, it transforms to EQ, but the generator for EQ
+isn't expecting numbers, so it doesn't use an immediate compare.
+Array hackery:
+Array type tests are transformed to %array-typep, separation of the
+implementation-dependent array-type handling.  This way we can transform
+     (or (simple-string-p x)
+	 (and (complex-array-p x)
+	      (= (array-rank x) 1)
+	      (simple-string-p (%array-data x))))
+In addition to the similar bit-vector-p, we also handle vectorp and any type
+tests on which the dimension isn't wild.
+[Note that we will want to expand into frobs compatible with those that
+array references expand into so that the same optimizations will work on both.]
+These changes combine to convert hairy type checks into hairy typep's, and then
+convert hairyp typeps into simple typeps.
+Do we really need non-VOP templates?  It seems that we could get the desired
+effect through implementation-dependent ICR transforms.  The main risk would be
+of obscuring the type semantics of the code.  We could fairly easily retain all
+the type information present at the time the tranform is run, but if we
+discover new type information, then it won't be propagated unless the VM also
+supplies type inference methods for its internal frobs (precluding the use of
+%PRIMITIVE, since primitives don't have derive-type methods.)  
+I guess one possibility would be to have the call still considered "known" even
+though it has been transformed.  But this doesn't work, since we start doing
+LET optimizations that trash the arglist once the call has been transformed
+(and indeed we want to.)
+Actually, I guess the overhead for providing type inference methods for the
+internal frobs isn't that great, since we can usually borrow the inference
+method for a Common Lisp function.  For example, in our AREF case:
+    (aref x y)
+    (let ((\#:len (array-dimension x 0)))
+      (%unchecked-aref x (%check-in-bounds y \#:len)))
+Now in this case, if we made %UNCHECKED-AREF have the same derive-type method
+as AREF, then if we discovered something new about X's element type, we could
+derive a new type for the entire expression.
+Actually, it seems that baring this detail at the ICR level is beneficial,
+since it admits the possibly of optimizing away the bounds check using type
+information.  If we discover X's dimensions, then \#:LEN becomes a constant that
+can be substituted.  Then %CHECK-IN-BOUNDS can notice that the bound is
+constant and check it against the type for Y.  If Y is known to be in range,
+then we can optimize away the bounds check.
+Actually in this particular case, the best thing to do would be if we
+discovered the bound is constant, then replace the bounds check with an
+implicit type check.  This way all the type check optimization mechanisms would
+be brought into the act.
+So we actually want to do the bounds-check expansion as soon as possible,
+rather than later than possible: it should be a source-transform, enabled by
+the fast-safe policy.
+With multi-dimensional arrays we probably want to explicitly do the index
+computation: this way portions of the index computation can become loop
+invariants.  In a scan in row-major order, the inner loop wouldn't have to do
+any multiplication: it would only do an addition.  We would use normal
+fixnum arithmetic, counting on * to cleverly handle multiplication by a
+constant, and appropriate inline expansion.
+Note that in a source transform, we can't make any assumptions the type of the
+array.  If it turns out to be a complex array without declared dimensions, then
+the calls to ARRAY-DIMENSION will have to turn into a VOP that can be affected.
+But if it is simple, then the VOP is unaffected, and if we know the bounds, it
+is constant.  Similarly, we would have %ARRAY-DATA and %ARRAY-DISPLACEMENT
+operations.  %ARRAY-DISPLACEMENT would optimize to 0 if we discover the array
+is simple.  [This is somewhat inefficient when the array isn't eventually
+discovered to be simple, since finding the data and finding the displacement
+duplicate each other.  We could make %ARRAY-DATA return both as MVs, and then
+optimize to (VALUES (%SIMPLE-ARRAY-DATA x) 0), but this would require
+optimization of trivial VALUES uses.]
+Also need (THE (ARRAY * * * ...) x) to assert correct rank.
+A bunch of functions have source transforms that convert them into the
+canonical form that later parts of the compiler want to see.  It is not legal
+to rely on the canonical form since source transforms can be inhibited by a
+Notinline declaration.  This shouldn't be a problem, since everyone should keep
+their hands off of Notinline calls.
+Some transformations:
+Endp  ==>  (NULL (THE LIST ...))
+(NOT xxx) or (NULL xxx) => (IF xxx NIL T)
+(typep x '<simple type>) => (<simple predicate> x)
+(typep x '<complex type>) => ...composition of simpler operations...
+TYPEP of AND, OR and NOT types turned into conditionals over multiple TYPEP
+calls.  This makes hairy TYPEP calls more digestible to type constraint
+propagation, and also means that the TYPEP code generators don't have to deal
+with these cases.  [\#\#\# In the case of union types we may want to do something
+to preserve information for type constraint propagation.]
+    (apply \#'foo a b c)
+    (multiple-value-call \#'foo (values a) (values b) (values-list c))
+This way only MV-CALL needs to know how to do calls with unknown numbers of
+arguments.  It should be nearly as efficient as a special-case VMR-Convert
+method could be.
+Make-String => Make-Array
+N-arg predicates associated into two-arg versions.
+Associate N-arg arithmetic ops.
+Expand CxxxR and FIRST...nTH
+Zerop, Plusp, Minusp, 1+, 1-, Min, Max, Rem, Mod
+(Values x), (Identity x) => (Prog1 x)
+All specialized aref functions => (aref (the xxx) ...)
+Convert (ldb (byte ...) ...) into internal frob that takes size and position as
+separate args.  Other byte functions also...
+Change for-value primitive predicates into (if <pred> t nil).  This isn't
+particularly useful during ICR phases, but makes life easy for VMR conversion.
+This last can't be a source transformation, since a source transform can't tell
+where the form appears.  Instead, ICR conversion special-cases calls to known
+functions with the Predicate attribute by doing the conversion when the
+destination of the result isn't an IF.  It isn't critical that this never be
+done for predicates that we ultimately discover to deliver their value to an
+IF, since IF optimizations will flush unnecessary IFs in a predicate.
+\section{Inline functions}
+[\#\#\# Inline expansion is especially powerful in the presence of good lisp-level
+optimization ("partial evaluation").  Many "optimizations" usually done in Lisp
+compilers by special-case source-to-source transforms can be had simply by
+making the source of the general case function available for inline expansion.
+This is especially helpful in Common Lisp, which has many commonly used
+functions with simple special cases but bad general cases (list and sequence
+functions, for example.)
+Inline expansion of recursive functions is allowed, and is not as silly as it
+sounds.  When expanded in a specific context, much of the overhead of the
+recursive calls may be eliminated (especially if there are many keyword
+arguments, etc.)
+[Also have MAYBE-INLINE]
+We only record a function's inline expansion in the global environment when the
+function is in the null lexical environment, since it the expansion must be
+represented as source.
+We do inline expansion of functions locally defined by FLET or LABELS even when
+the environment is not null.  Since the appearances of the local function must
+be nested within the desired environment, it is possible to expand local
+functions inline even when they use the environment.  We just stash the source
+form and environments in the Functional for the local function.  When we
+convert a call to it, we just reconvert the source in the saved environment.
+An interesting alternative to the inline/full-call dichotomy is "semi-inline"
+coding.  Whenever we have an inline expansion for a function, we can expand it
+only once per block compilation, and then use local call to call this copied
+version.  This should get most of the speed advantage of real inline coding
+with much less code bloat.  This is especially attractive for simple system
+functions such as Read-Char.
+The main place where true inline expansion would still be worth doing is where
+large amounts of the function could be optimized away by constant folding or
+other optimizations that depend on the exact arguments to the call.
+\section{Compilation policy}
+We want more sophisticated control of compilation safety than is offered in CL,
+so that we can emit only those type checks that are likely to discover
+something (i.e. external interfaces.)
+Generalized back-end notion provides dynamic retargeting?  (for byte code)
+The current node type annotations seem to be somewhat unsatisfactory, since we
+lose information when we do a THE on a continuation that already has uses, or
+when we convert a let where the actual result continuation has other uses.  
+But the case with THE isn't really all that bad, since the test of whether
+there are any uses happens before conversion of the argument, thus THE loses
+information only when there are uses outside of the declared form.  The LET
+case may not be a big deal either.
+Note also that losing user assertions isn't really all that bad, since it won't
+damage system integrity.  At worst, it will cause a bug to go undetected.  More
+likely, it will just cause the error to be signaled in a different place (and
+possibly in a less informative way).  Of course, there is an efficiency hit for
+losing type information, but if it only happens in strange cases, then this
+isn't a big deal.
+\chapter{Local call analysis}
+All calls to local functions (known named functions and LETs) are resolved to
+the exact LAMBDA node which is to be called.  If the call is syntactically
+illegal, then we emit a warning and mark the reference as :notinline, forcing
+the call to be a full call.  We don't even think about converting APPLY calls;
+APPLY is not special-cased at all in ICR.  We also take care not to convert
+calls in the top-level component, which would join it to normal code.  Calls to
+functions with rest args and calls with non-constant keywords are also not
+We also convert MV-Calls that look like MULTIPLE-VALUE-BIND to local calls,
+since we know that they can be open-coded.  We replace the optional dispatch
+with a call to the last optional entry point, letting MV-Call magically default
+the unsupplied values to NIL.
+When ICR optimizations discover a possible new local call, they explicitly
+invoke local call analysis on the code that needs to be reanalyzed. 
+[\#\#\# Let conversion.  What is means to be a let.  Argument type checking done
+by caller.  Significance of local call is that all callers are known, so
+special call conventions may be used.]
+A lambda called in only one place is called a "let" call, since a Let would
+turn into one.
+In addition to enabling various ICR optimizations, the let/non-let distinction
+has important environment significance.  We treat the code in function and all
+of the lets called by that function as being in the same environment.  This
+allows exits from lets to be treated as local exits, and makes life easy for
+environment analysis.  
+Since we will let-convert any function with only one call, we must be careful
+about cleanups.  It is possible that a lexical exit from the let function may
+have to clean up dynamic bindings not lexically apparent at the exit point.  We
+handle this by annotating lets with any cleanup in effect at the call site.
+The cleanup for continuations with no immediately enclosing cleanup is the
+lambda that the continuation is in.  In this case, we look at the lambda to see
+if any cleanups need to be done.
+Let conversion is disabled for entry-point functions, since otherwise we might
+convert the call from the XEP to the entry point into a let.  Then later on, we
+might want to convert a non-local reference into a local call, and not be able
+to, since once a function has been converted to a let, we can't convert it
+A function's return node may also be deleted if it is unreachable, which can
+happen if the function never returns normally.  Such functions are not lets.
+\chapter{Find components}
+This is a post-pass to ICR conversion that massages the flow graph into the
+shape subsequent phases expect.  Things done:
+  Compute the depth-first ordering for the flow graph.
+  Find the components (disconnected parts) of the flow graph.
+This pass need only be redone when newly converted code has been added to the
+flow graph.  The reanalyze flag in the component structure should be set by
+people who mess things up.
+We create the initial DFO using a variant of the basic algorithm.  The initial
+DFO computation breaks the ICR up into components, which are parts that can be
+compiled independently.  This is done to increase the efficiency of large block
+compilations.  In addition to improving locality of reference and reducing the
+size of flow analysis problems, this allows back-end data structures to be
+reclaimed after the compilation of each component.
+ICR optimization can change the connectivity of the flow graph by discovering
+new calls or eliminating dead code.  Initial DFO determination splits up the
+flow graph into separate components, but does so conservatively, ensuring that
+parts that might become joined (due to local call conversion) are joined from
+the start.  Initial DFO computation also guarantees that all code which shares
+a lexical environment is in the same component so that environment analysis
+needs to operate only on a single component at a time.
+[This can get a bit hairy, since code seemingly reachable from the
+environment entry may be reachable from a NLX into that environment.  Also,
+function references must be considered as links joining components even though
+the flow graph doesn't represent these.]
+After initial DFO determination, components are neither split nor joined.  The
+standard DFO computation doesn't attempt to split components that have been
+\chapter{ICR optimize}
+{\bf Somewhere describe basic ICR utilities: continuation-type,
+constant-continuation-p, etc.  Perhaps group by type in ICR description?}
+We are conservative about doing variable-for-variable substitution in ICR
+optimization, since if we substitute a variable with a less restrictive type,
+then we may prevent use of a "good" representation within the scope of the
+inner binding.
+Note that variable-variable substitutions aren't really crucial in ICR, since
+they don't create opportunities for new optimizations (unlike substitution of
+constants and functions).  A spurious variable-variable binding will show up as
+a Move operation in VMR.  This can be optimized away by reaching-definitions
+and also by targeting.  [\#\#\# But actually, some optimizers do see if operands
+are the same variable.]
+The IF-IF optimization can be modeled as a value driven optimization, since
+adding a use definitely is cause for marking the continuation for
+reoptimization.  [When do we add uses?  Let conversion is the only obvious
+time.]  I guess IF-IF conversion could also be triggered by a non-immediate use
+of the test continuation becoming immediate, but to allow this to happen would
+require Delete-Block (or somebody) to mark block-starts as needing to be
+reoptimized when a predecessor changes.  It's not clear how important it is
+that IF-IF conversion happen under all possible circumstances, as long as it
+happens to the obvious cases.
+[\#\#\# It isn't totally true that code flushing never enables other worthwhile
+optimizations.  Deleting a functional reference can cause a function to cease
+being an XEP, or even trigger let conversion.  It seems we still want to flush
+code during ICR optimize, but maybe we want to interleave it more intimately
+with the optimization pass.  
+Ref-flushing works just as well forward as backward, so it could be done in the
+forward pass.  Call flushing doesn't work so well, but we could scan the block
+backward looking for any new flushable stuff if we flushed a call on the
+forward pass.
+When we delete a variable due to lack of references, we leave the variable
+in the lambda-list so that positional references still work.  The initial value
+continuation is flushed, though (replaced with NIL) allowing the initial value
+for to be deleted (modulo side-effects.)
+Note that we can delete vars with no refs even when they have sets.  I guess
+when there are no refs, we should also flush all sets, allowing the value
+expressions to be flushed as well.
+Squeeze out single-reference unset let variables by changing the dest of the
+initial value continuation to be the node that receives the ref.  This can be
+done regardless of what the initial value form is, since we aren't actually
+moving the evaluation.  Instead, we are in effect using the continuation's
+locations in place of the temporary variable.  
+Doing this is of course, a wild violation of stack discipline, since the ref
+might be inside a loop, etc.  But with the VMR back-end, we only need to
+preserve stack discipline for unknown-value continuations; this ICR
+transformation must be already be inhibited when the DEST of the REF is a
+multiple-values receiver (EXIT, RETURN or MV-COMBINATION), since we must
+preserve the single-value semantics of the let-binding in this case.
+The REF and variable must be deleted as part of this operation, since the ICR
+would otherwise be left in an inconsistent state; we can't wait for the REF to
+be deleted due to bing unused, since we have grabbed the arg continuation and
+substituted it into the old DEST.
+The big reason for doing this transformation is that in macros such as INCF and
+PSETQ, temporaries are squeezed out, and the new value expression is evaluated
+directly to the setter, allowing any result type assertion to be applied to the
+expression evaluation.  Unlike in the case of substitution, there is no point
+in inhibiting this transformation when the initial value type is weaker than
+the variable type.  Instead, we intersect the asserted type for the old REF's
+CONT with the type assertion on the initial value continuation.  Note that the
+variable's type has already been asserted on the initial-value continuation.
+Of course, this transformation also simplifies the ICR even when it doesn't
+discover interesting type assertions, so it makes sense to do it whenever
+possible.  This reduces the demands placed on register allocation, etc.
+There are three dead-code flushing rules:
+ 1] Refs with no DEST may be flushed.
+ 2] Known calls with no dest that are flushable may be flushed.  We null the
+    DEST in all the args.
+ 3] If a lambda-var has no refs, then it may be deleted.  The flushed argument
+    continuations have their DEST nulled.
+These optimizations all enable one another.  We scan blocks backward, looking
+for nodes whose CONT has no DEST, then type-dispatching off of the node.  If we
+delete a ref, then we check to see if it is a lambda-var with no refs.  When we
+flush an argument, we mark the blocks for all uses of the CONT as needing to be
+\section{Goals for ICR optimizations}
+When an optimization is disabled, code should still be correct and not
+ridiculously inefficient.  Phases shouldn't be made mandatory when they have
+lots of non-required stuff jammed into them.
+This pass is optional, but is desirable if anything is more important than
+compilation speed.
+This phase is a grab-bag of optimizations that concern themselves with the flow
+of values through the code representation.  The main things done are type
+inference, constant folding and dead expression elimination.  This phase can be
+understood as a walk of the expression tree that propagates assertions down the
+tree and propagates derived information up the tree.  The main complication is
+that there isn't any expression tree, since ICR is flow-graph based.
+We repeat this pass until we don't discover anything new.  This is a bit of
+feat, since we dispatch to arbitrary functions which may do arbitrary things,
+making it hard to tell if anything really happened.  Even if we solve this
+problem by requiring people to flag when they changed or by checking to see if
+they changed something, there are serious efficiency problems due to massive
+redundant computation, since in many cases the only way to tell if anything
+changed is to recompute the value and see if it is different from the old one.
+We solve this problem by requiring that optimizations for a node only depend on
+the properties of the CONT and the continuations that have the node as their
+DEST.  If the continuations haven't changed since the last pass, then we don't
+attempt to re-optimize the node, since we know nothing interesting will happen.
+We keep track of which continuations have changed by a REOPTIMIZE flag that is
+set whenever something about the continuation's value changes.
+When doing the bottom up pass, we dispatch to type specific code that knows how
+to tell when a node needs to be reoptimized and does the optimization.  These
+node types are special-cased: COMBINATION, IF, RETURN, EXIT, SET.
+The REOPTIMIZE flag in the COMBINATION-FUN is used to detect when the function
+information might have changed, so that we know when where are new assertions
+that could be propagated from the function type to the arguments.
+When we discover something about a leaf, or substitute for leaf, we reoptimize
+the CONT for all the REF and SET nodes. 
+We have flags in each block that indicate when any nodes or continuations in
+the block need to be re-optimized, so we don't have to scan blocks where there
+is no chance of anything happening.
+It is important for efficiency purposes that optimizers never say that they did
+something when they didn't, but this by itself doesn't guarantee timely
+termination.  I believe that with the type system implemented, type inference
+will converge in finite time, but as a practical matter, it can take far too
+long to discover not much.  For this reason, ICR optimization is terminated
+after three consecutive passes that don't add or delete code.  This premature
+termination only happens 2% of the time.
+\section{Flow graph simplification}
+Things done:
+    Delete blocks with no predecessors.
+    Merge blocks that can be merged.
+    Convert local calls to Let calls.
+    Eliminate degenerate IFs.
+We take care not to merge blocks that are in different functions or have
+different cleanups.  This guarantees that non-local exits are always at block
+ends and that cleanup code never needs to be inserted within a block.
+We eliminate IFs with identical consequent and alternative.  This would most
+likely happen if both the consequent and alternative were optimized away.
+[Could also be done if the consequent and alternative were different blocks,
+but computed the same value.  This could be done by a sort of cross-jumping
+optimization that looked at the predecessors for a block and merged code shared
+between predecessors.  IFs with identical branches would eventually be left
+with nothing in their branches.]
+We eliminate IF-IF constructs:
+    (IF (IF A B C) D E) ==>
+    (IF A (IF B D E) (IF C D E))
+In reality, what we do is replicate blocks containing only an IF node where the
+predicate continuation is the block start.  We make one copy of the IF node for
+each use, leaving the consequent and alternative the same.  If you look at the
+flow graph representation, you will see that this is really the same thing as
+the above source to source transformation.
+\section{Forward ICR optimizations}
+In the forward pass, we scan the code in forward depth-first order.  We
+examine each call to a known function, and:
+\item Eliminate any bindings for unused variables.
+\item Do top-down type assertion propagation.  In local calls, we propagate
+asserted and derived types between the call and the called lambda.
+    Replace calls of foldable functions with constant arguments with the
+    result.  We don't have to actually delete the call node, since Top-Down
+    optimize will delete it now that its value is unused.
+   Run any Optimizer for the current function.  The optimizer does arbitrary
+    transformations by hacking directly on the IR.  This is useful primarily
+    for arithmetic simplification and similar things that may need to examine
+    and modify calls other than the current call.  The optimizer is responsible
+    for recording any changes that it makes.  An optimizer can inhibit further
+    optimization of the node during the current pass by returning true.  This
+    is useful when deleting the node.
+   Do ICR transformations, replacing a global function call with equivalent
+    inline lisp code.
+    Do bottom-up type propagation/inferencing.  For some functions such as
+    Coerce we will dispatch to a function to find the result type.  The
+    Derive-Type function just returns a type structure, and we check if it is
+    different from the old type in order to see if there was a change.
+    Eliminate IFs with predicates known to be true or false.
+    Substitute the value for unset let variables that are bound to constants,
+    unset lambda variables or functionals.
+    Propagate types from local call args to var refs.
+We use type info from the function continuation to find result types for
+functions that don't have a derive-type method.
+ICR transformation:
+ICR transformation does "source to source" transformations on known global
+functions, taking advantage of semantic information such as argument types and
+constant arguments.  Transformation is optional, but should be done if speed or
+space is more important than compilation speed.  Transformations which increase
+space should pass when space is more important than speed.
+A transform is actually an inline function call where the function is computed
+at compile time.  The transform gets to peek at the continuations for the
+arguments, and computes a function using the information gained.  Transforms
+should be cautious about directly using the values of constant continuations,
+since the compiler must preserve eqlness of named constants, and it will have a
+hard time if transforms go around randomly copying constants.
+The lambda that the transform computes replaces the original function variable
+reference as the function for the call.  This lets the compiler worry about
+evaluating each argument once in the right order.  We want to be careful to
+preserve type information when we do a transform, since it may be less than
+obvious what the transformed code does.
+There can be any number of transforms for a function.  Each transform is
+associated with a function type that the call must be compatible with.  A
+transform is only invoked if the call has the right type.  This provides a way
+to deal with the common case of a transform that only applies when the
+arguments are of certain types and some arguments are not specified.  We always
+use the derived type when determining whether a transform is applicable.  Type
+check is responsible for setting the derived type to the intersection of the
+asserted and derived types.
+If the code in the expansion has insufficient explicit or implicit argument
+type checking, then it should cause checks to be generated by making
+A transformation may decide to pass if it doesn't like what it sees when it
+looks at the args.  The Give-Up function unwinds out of the transform and deals
+with complaining about inefficiency if speed is more important than brevity.
+The format args for the message are arguments to Give-Up.  If a transform can't
+be done, we just record the message where ICR finalize can find it.  note.  We
+can't complain immediately, since it might get transformed later on.
+\section{Backward ICR optimizations}
+In the backward pass, we scan each block in reverse order, and
+eliminate any effectless nodes with unused values.  In ICR this is the
+only way that code is deleted other than the elimination of unreachable blocks.
+\chapter{Type checking}
+[\#\#\# Somehow split this section up into three parts:
+ -- Conceptual: how we know a check is necessary, and who is responsible for
+    doing checks.
+ -- Incremental: intersection of derived and asserted types, checking for
+    non-subtype relationship.
+ -- Check generation phase.
+We need to do a pretty good job of guessing when a type check will ultimately
+need to be done.  Generic arithmetic, for example: In the absence of
+declarations, we will use use the safe variant, but if we don't know this, we
+will generate a check for NUMBER anyway.  We need to look at the fast-safe
+templates and guess if any of them could apply.
+We compute a function type from the VOP arguments
+and assertions on those arguments.  This can be used with Valid-Function-Use
+to see which templates do or might apply to a particular call.  If we guess
+that a safe implementation will be used, then we mark the continuation so as to
+force a safe implementation to be chosen.  [This will happen if ICR optimize
+doesn't run to completion, so the icr optimization after type check generation
+can discover new type information.  Since we won't redo type check at that
+point, there could be a call that has applicable unsafe templates, but isn't
+type checkable.]
+[\#\#\# A better and more general optimization of structure type checks: in type
+check conversion, we look at the *original derived* type of the continuation:
+if the difference between the proven type and the asserted type is a simple
+type check, then check for the negation of the difference.  e.g. if we want a
+FOO and we know we've got (OR FOO NULL), then test for (NOT NULL).  This is a
+very important optimization for linked lists of structures, but can also apply
+in other situations.]
+If after ICR phases, we have a continuation with check-type set in a context
+where it seems likely a check will be emitted, and the type is too 
+hairy to be easily checked (i.e. no CHECK-xxx VOP), then we do a transformation
+on the ICR equivalent to:
+  (... (the hair <foo>) ...)
+  (... (funcall \#'(lambda (\#:val)
+		    (if (typep \#:val 'hair)
+			\#:val
+			(%type-check-error \#:val 'hair)))
+		<foo>)
+       ...)
+This way, we guarantee that VMR conversion never has to emit type checks for
+hairy types.
+[Actually, we need to do a MV-bind and several type checks when there is a MV
+continuation.  And some values types are just too hairy to check.  We really
+can't check any assertion for a non-fixed number of values, since there isn't
+any efficient way to bind arbitrary numbers of values.  (could be done with
+MV-call of a more-arg function, I guess...)
+[Perhaps only use CHECK-xxx VOPs for types equivalent to a ptype?  Exceptions
+for CONS and SYMBOL?  Anyway, no point in going to trouble to implement and
+emit rarely used CHECK-xxx vops.]
+One potential lose in converting a type check to explicit conditionals rather
+than to a CHECK-xxx VOP is that VMR code motion optimizations won't be able to
+do anything.  This shouldn't be much of an issue, though, since type constraint
+propagation has already done global optimization of type checks.
+This phase is optional, but should be done if anything is more important than
+compile speed.  
+Type check is responsible for reconciling the continuation asserted and derived
+types, emitting type checks if appropriate.  If the derived type is a subtype
+of the asserted type, then we don't need to do anything.
+If there is no intersection between the asserted and derived types, then there
+is a manifest type error.  We print a warning message, indicating that
+something is almost surely wrong.  This will inhibit any transforms or
+generators that care about their argument types, yet also inhibits further
+error messages, since NIL is a subtype of every type.
+If the intersection is not null, then we set the derived type to the
+intersection of the asserted and derived types and set the Type-Check flag in
+the continuation.  We always set the flag when we can't prove that the type
+assertion is satisfied, regardless of whether we will ultimately actually emit
+a type check or not.  This is so other phases such as type constraint
+propagation can use the Type-Check flag to detect an interesting type
+assertion, instead of having to duplicate much of the work in this phase.  
+[\#\#\# 7 extremely random values for CONTINUATION-TYPE-CHECK.]
+Type checks are generated on the fly during VMR conversion.  When VMR
+conversion generates the check, it prints an efficiency note if speed is
+important.  We don't flame now since type constraint progpagation may decide
+that the check is unnecessary.  [\#\#\# Not done now, maybe never.]
+In local function call, it is the caller that is in effect responsible for
+checking argument types.  This happens in the same way as any other type check,
+since ICR optimize propagates the declared argument types to the type
+assertions for the argument continuations in all the calls.
+Since the types of arguments to entry points are unknown at compile time, we
+want to do runtime checks to ensure that the incoming arguments are of the
+correct type.  This happens without any special effort on the part of type
+check, since the XEP is represented as a local call with unknown type
+arguments.  These arguments will be marked as needing to be checked.
+\chapter{Constraint propagation}
+New lambda-var-slot:
+constraints: a list of all the constraints on this var for either X or Y.
+How to maintain consistency?  Does it really matter if there are constraints
+with deleted vars lying around?  Note that whatever mechanism we use for
+getting the constraints in the first place should tend to keep them up to date.
+Probably we would define optimizers for the interesting relations that look at
+their CONT's dest and annotate it if it is an IF.
+But maybe it is more trouble then it is worth trying to build up the set of
+constraints during ICR optimize (maintaining consistency in the process).
+Since ICR optimize iterates a bunch of times before it converges, we would be
+wasting time recomputing the constraints, when nobody uses them till constraint
+propagation runs.  
+It seems that the only possible win is if we re-ran constraint propagation
+(which we might want to do.)  In that case, we wouldn't have to recompute all
+the constraints from scratch.  But it seems that we could do this just as well
+by having ICR optimize invalidate the affected parts of the constraint
+annotation, rather than trying to keep them up to date.  This also fits better
+with the optional nature of constraint propagation, since we don't want ICR
+optimize to commit to doing a lot of the work of constraint propagation.  
+For example, we might have a per-block flag indicating that something happened
+in that block since the last time constraint propagation ran.  We might have
+different flags to represent the distinction between discovering a new type
+assertion inside the block and discovering something new about an if
+predicate, since the latter would be cheaper to update and probably is more
+It's fairly easy to see how we can build these sets of restrictions and
+propagate them using flow analysis, but actually using this information seems
+a bit more ad-hoc.  
+Probably the biggest thing we do is look at all the refs.  If have proven that
+the value is EQ (EQL for a number) to some other leaf (constant or lambda-var),
+then we can substitute for that reference.  In some cases, we will want to do
+special stuff depending on the DEST.  If the dest is an IF and we proved (not
+null), then we can substitute T.  And if the dest is some relation on the same
+two lambda-vars, then we want to see if we can show that relation is definitely
+true or false.
+Otherwise, we can do our best to invert the set of restrictions into a type.
+Since types hold only constant info, we have to ignore any constraints between
+two vars.  We can make some use of negated type restrictions by using
+TYPE-DIFFERENCE to remove the type from the ref types.  If our inferred type is
+as good as the type assertion, then the continuation's type-check flag will be
+It really isn't much of a problem that we don't infer union types on joins,
+since union types are relatively easy to derive without using flow information.
+The normal bottom-up type inference done by ICR optimize does this for us: it
+annotates everything with the union of all of the things it might possibly be.
+Then constraint propagation subtracts out those types that can't be in effect
+because of predicates or checks.
+This phase is optional, but is desirable if anything is more important than
+compilation speed.  We use an algorithm similar to available expressions to
+propagate variable type information that has been discovered by implicit or
+explicit type tests, or by type inference.
+We must do a pre-pass which locates set closure variables, since we cannot do
+flow analysis on such variables.  We set a flag in each set closure variable so
+that we can quickly tell that it is losing when we see it again.  Although this
+may seem to be wastefully redundant with environment analysis, the overlap
+isn't really that great, and the cost should be small compared to that of the
+flow analysis that we are preparing to do.  [Or we could punt on set
+A type constraint is a structure that includes sset-element and has the type
+and variable.  
+[\#\#\# Also a not-p flag indicating whether the sense is negated.]
+  Each variable has a list of its type constraints.  We create a
+type constraint when we see a type test or check.  If there is already a
+constraint for the same variable and type, then we just re-use it.  If there is
+already a weaker constraint, then we generate both the weak constraints and the
+strong constraint so that the weak constraints won't be lost even if the strong
+one is unavailable.
+We find all the distinct type constraints for each variable during the pre-pass
+over the lambda nesting.  Each constraint has a list of the weaker constraints
+so that we can easily generate them.
+Every block generates all the type constraints in it, but a constraint is
+available in a successor only if it is available in all predecessors.  We
+determine the actual type constraint for a variable at a block by intersecting
+all the available type constraints for that variable.
+This isn't maximally tense when there are constraints that are not
+hierarchically related, e.g. (or a b) (or b c).  If these constraints were
+available from two predecessors, then we could infer that we have an (or a b c)
+constraint, but the above algorithm would come up with none.  This probably
+isn't a big problem.
+[\#\#\# Do we want to deal with (if (eq <var> '<foo>) ...) indicating singleton
+member type?]
+We detect explicit type tests by looking at type test annotation in the IF
+node.  If there is a type check, the OUT sets are stored in the node, with
+different sets for the consequent and alternative.  Implicit type checks are
+located by finding Ref nodes whose Cont has the Type-Check flag set.  We don't
+actually represent the GEN sets, we just initialize OUT to it, and then form
+the union in place.
+When we do the post-pass, we clear the Type-Check flags in the continuations
+for Refs when we discover that the available constraints satisfy the asserted
+type.  Any explicit uses of typep should be cleaned up by the ICR optimizer for
+typep.  We can also set the derived type for Refs to the intersection of the
+available type assertions.  If we discover anything, we should consider redoing
+ICR optimization, since better type information might enable more
+\chapter{ICR finalize} % -*- Dictionary: design -*-
+This pass looks for interesting things in the ICR so that we can forget about
+them.  Used and not defined things are flamed about.
+We postpone these checks until now because the ICR optimizations may discover
+errors that are not initially obvious.  We also emit efficiency notes about
+optimizations that we were unable to do.  We can't emit the notes immediately,
+since we don't know for sure whether a repeated attempt at optimization will
+We examine all references to unknown global function variables and update the
+approximate type accordingly.  We also record the names of the unknown
+functions so that they can be flamed about if they are never defined.  Unknown
+normal variables are flamed about on the fly during ICR conversion, so we
+ignore them here.
+We check each newly defined global function for compatibility with previously
+recorded type information.  If there is no :defined or :declared type, then we
+check for compatibility with any approximate function type inferred from
+previous uses.
+\chapter{Environment analysis}
+A related change would be to annotate ICR with information about tail-recursion
+relations.  What we would do is add a slot to the node structure that points to
+the corresponding Tail-Info when a node is in a TR position.  This annotation
+would be made in a final ICR pass that runs after cleanup code is generated
+(part of environment analysis).  When true, the node is in a true TR position
+(modulo return-convention incompatibility).  When we determine return
+conventions, we null out the tail-p slots in XEP calls or known calls where we
+decided not to preserve tail-recursion. 
+In this phase, we also check for changes in the dynamic binding environment
+that require cleanup code to be generated.  We just check for changes in the
+Continuation-Cleanup on local control transfers.  If it changes from
+an inner dynamic context to an outer one that is in the same environment, then
+we emit code to clean up the dynamic bindings between the old and new
+continuation.  We represent the result of cleanup detection to the back end by
+interposing a new block containing a call to a funny function.  Local exits
+from CATCH or UNWIND-PROTECT are detected in the same way.
+The primary activity in environment analysis is the annotation of ICR with
+environment structures describing where variables are allocated and what values
+the environment closes over.
+Each lambda points to the environment where its variables are allocated, and
+the environments point back.  We always allocate the environment at the Bind
+node for the sole non-let lambda in the environment, so there is a close
+relationship between environments and functions.  Each "real function" (i.e.
+not a LET) has a corresponding environment.
+We attempt to share the same environment among as many lambdas as possible so
+that unnecessary environment manipulation is not done.  During environment
+analysis the only optimization of this sort is realizing that a Let (a lambda
+with no Return node) cannot need its own environment, since there is no way
+that it can return and discover that its old values have been clobbered.
+When the function is called, values from other environments may need to be made
+available in the function's environment.  These values are said to be "closed
+Even if a value is not referenced in a given environment, it may need to be
+closed over in that environment so that it can be passed to a called function
+that does reference the value.  When we discover that a value must be closed
+over by a function, we must close over the value in all the environments where
+that function is referenced.  This applies to all references, not just local
+calls, since at other references we must have the values on hand so that we can
+build a closure.  This propagation must be applied recursively, since the value
+must also be available in *those* functions' callers.
+If a closure reference is known to be "safe" (not an upward funarg), then the
+closure structure may be allocated on the stack.
+Closure analysis deals only with closures over values, while Common Lisp
+requires closures over variables.  The difference only becomes significant when
+variables are set.  If a variable is not set, then we can freely make copies of
+it without keeping track of where they are.  When a variable is set, we must
+maintain a single value cell, or at least the illusion thereof.  We achieve
+this by creating a heap-allocated "value cell" structure for each set variable
+that is closed over.  The pointer to this value cell is passed around as the
+"value" corresponding to that variable.  References to the variable must
+explicitly indirect through the value cell.
+When we are scanning over the lambdas in the component, we also check for bound
+but not referenced variables.
+Environment analysis emits cleanup code for local exits and markers for
+non-local exits.
+A non-local exit is a control transfer from one environment to another.  In a
+non-local exit, we must close over the continuation that we transfer to so that
+the exiting function can find its way back.  We indicate the need to close a
+continuation by placing the continuation structure in the closure and also
+pushing it on a list in the environment structure for the target of the exit.
+[\#\#\# To be safe, we would treat the continuation as a set closure variable so
+that we could invalidate it when we leave the dynamic extent of the exit point.
+Transferring control to a meaningless stack pointer would be apt to cause
+horrible death.]
+Each local control transfer may require dynamic state such as special bindings
+to be undone.  We represent cleanup actions by funny function calls in a new
+block linked in as an implicit MV-PROG1.