[3d19a6]: doc / manual / efficiency.texinfo Maximize Restore History

Download this file

efficiency.texinfo    241 lines (192 with data), 8.4 kB

@node Efficiency, Beyond The ANSI Standard, The Debugger, Top
@comment  node-name,  next,  previous,  up
@chapter Efficiency

FIXME: The material in the CMUCL manual about getting good
performance from the compiler should be reviewed, reformatted in
Texinfo, lightly edited for SBCL, and substituted into this
manual. In the meantime, the original CMUCL manual is still 95+%
correct for the SBCL version of the Python compiler. See the
sections

@itemize
@item Advanced Compiler Use and Efficiency Hints
@item Advanced Compiler Introduction
@item More About Types in Python
@item Type Inference
@item Source Optimization
@item Tail Recursion
@item Local Call
@item Block Compilation
@item Inline Expansion
@item Object Representation
@item Numbers
@item General Efficiency Hints
@item Efficiency Notes
@end itemize

Besides this information from the CMUCL manual, there are a few other
points to keep in mind.

@itemize
  
@item
The CMUCL manual doesn't seem to state it explicitly, but Python has a
mental block about type inference when assignment is involved. Python
is very aggressive and clever about inferring the types of values
bound with @code{let}, @code{let*}, inline function call, and so
forth. However, it's much more passive and dumb about inferring the
types of values assigned with @code{setq}, @code{setf}, and
friends. It would be nice to fix this, but in the meantime don't
expect that just because it's very smart about types in most respects
it will be smart about types involved in assignments.  (This doesn't
affect its ability to benefit from explicit type declarations
involving the assigned variables, only its ability to get by without
explicit type declarations.)

@c <!-- FIXME: Python dislikes assignments, but not in type
@c     inference. The real problems are loop induction, closed over
@c     variables and aliases. -->
  
@item
Since the time the CMUCL manual was written, CMUCL (and thus SBCL) has
gotten a generational garbage collector. This means that there are
some efficiency implications of various patterns of memory usage which
aren't discussed in the CMUCL manual. (Some new material should be
written about this.)
  
@item
SBCL has some important known efficiency problems.  Perhaps the most
important are
    
@itemize @minus
      
@item
There is only limited support for the ANSI @code{dynamic-extent}
declaration.  @xref{Dynamic-extent allocation}.
      
@item
The garbage collector is not particularly efficient, at least on
platforms without the generational collector (as of SBCL 0.8.9, all
except x86).
      
@item
Various aspects of the PCL implementation of CLOS are more inefficient
than necessary.
    
@end itemize

@end itemize

Finally, note that Common Lisp defines many constructs which, in
the infamous phrase, ``could be compiled efficiently by a
sufficiently smart compiler''. The phrase is infamous because
making a compiler which actually is sufficiently smart to find all
these optimizations systematically is well beyond the state of the art
of current compiler technology. Instead, they're optimized on a
case-by-case basis by hand-written code, or not optimized at all if
the appropriate case hasn't been hand-coded. Some cases where no such
hand-coding has been done as of SBCL version 0.6.3 include

@itemize
  
@item
@code{(reduce #'f x)} where the type of @code{x} is known at compile
time
  
@item
various bit vector operations, e.g.  @code{(position 0
some-bit-vector)}

@item
specialized sequence idioms, e.g.  @code{(remove item list :count 1)}

@item
cases where local compilation policy does not require excessive type
checking, e.g.  @code{(locally (declare (safety 1)) (assoc item
list))} (which currently performs safe @code{endp} checking internal
to assoc).

@end itemize

If your system's performance is suffering because of some construct
which could in principle be compiled efficiently, but which the SBCL
compiler can't in practice compile efficiently, consider writing a
patch to the compiler and submitting it for inclusion in the main
sources. Such code is often reasonably straightforward to write;
search the sources for the string ``@code{deftransform}'' to find many
examples (some straightforward, some less so).

@menu
* Dynamic-extent allocation::   
* Modular arithmetic::          
@end menu

@node  Dynamic-extent allocation, Modular arithmetic, Efficiency, Efficiency
@comment  node-name,  next,  previous,  up
@section Dynamic-extent allocation
@cindex Dynamic-extent declaration

SBCL has limited support for performing allocation on the stack when a
variable is declared @code{dynamic-extent}.  The @code{dynamic-extent}
declarations are not verified, but are simply trusted; if the
constraints in the Common Lisp standard are violated, the best that
can happen is for the program to have garbage in variables and return
values; more commonly, the system will crash.

As a consequence of this, the condition for performing stack
allocation is stringent: either of the @code{speed} or @code{space}
optimization qualities must be higher than the maximum of
@code{safety} and @code{debug} at the point of the allocation.  For
example:

@lisp
(locally
  (declare (optimize speed (safety 1) (debug 1)))
  (defun foo (&rest rest)
    (declare (dynamic-extent rest))
    (length rest)))
@end lisp

Here the @code{&rest} list will be allocated on the stack.  Note that
it would not be in the following situation:

@lisp
(defun foo (&rest rest)
  (declare (optimize speed (safety 1) (debug 1)))
  (declare (dynamic-extent rest))
  (length rest))
@end lisp

because both the allocation of the @code{&rest} list and the variable
binding are outside the scope of the @code{optimize} declaration.

There are many cases when dynamic-extent declarations could be useful.
At present, SBCL implements

@itemize 

@item
Stack allocation of @code{&rest} lists, where these are declared
@code{dynamic-extent}.

@end itemize

Future plans include

@itemize

@item
Stack allocation of closures, where these are declared
@code{dynamic-extent};

@item
Stack allocation of @code{list}, @code{list*} and @code{cons}
(including following chains during initialization, and also for
binding mutation), where the allocation is declared
@code{dynamic-extent};

@item
Automatic detection of the common idiom of applying a function to some
defaults and a @code{&rest} list, even when this is not declared
@code{dynamic-extent};

@item
Automatic detection of the common idiom of calling quantifiers with a
closure, even when the closure is not declared @code{dynamic-extent}.

@end itemize

@node  Modular arithmetic,  , Dynamic-extent allocation, Efficiency
@comment  node-name,  next,  previous,  up
@section Modular arithmetic
@cindex Modular arithmetic
@cindex Arithmetic, modular
@cindex Arithmetic, hardware

Some numeric functions have a property: @var{N} lower bits of the
result depend only on @var{N} lower bits of (all or some)
arguments. If the compiler sees an expression of form @code{(logand
@var{exp} @var{mask})}, where @var{exp} is a tree of such ``good''
functions and @var{mask} is known to be of type @code{(unsigned-byte
@var{w})}, where @var{w} is a ``good'' width, all intermediate results
will be cut to @var{w} bits (but it is not done for variables and
constants!). This often results in an ability to use simple machine
instructions for the functions.

Consider an example.

@lisp
(defun i (x y)
  (declare (type (unsigned-byte 32) x y))
  (ldb (byte 32 0) (logxor x (lognot y))))
@end lisp

The result of @code{(lognot y)} will be negative and of type
@code{(signed-byte 33)}, so a naive implementation on a 32-bit
platform is unable to use 32-bit arithmetic here. But modular
arithmetic optimizer is able to do it: because the result is cut down
to 32 bits, the compiler will replace @code{logxor} and @code{lognot}
with versions cutting results to 32 bits, and because terminals
(here---expressions @code{x} and @code{y}) are also of type
@code{(unsigned-byte 32)}, 32-bit machine arithmetic can be used.


As of SBCL 0.8.5 ``good'' functions are @code{+}, @code{-};
@code{logand}, @code{logior}, @code{logxor}, @code{lognot} and their
combinations; and @code{ash} with the positive second
argument. ``Good'' widths are 32 on HPPA, MIPS, PPC, Sparc and X86 and
64 on Alpha. While it is possible to support smaller widths as well,
currently it is not implemented.