[4ae836]: OPTIMIZATIONS Maximize Restore History

Download this file

OPTIMIZATIONS    206 lines (179 with data), 6.9 kB

#1
(defun mysl (s)
    (declare (simple-string s))
    (declare (optimize (speed 3) (safety 0) (debug 0)))
    (let ((c 0))
      (declare (fixnum c))
      (dotimes (i (length s))
        (when (eql (aref s i) #\1)
          (incf c)))
      c))

* On X86 I is represented as a tagged integer.

* Unnecessary move:
  3: SLOT S!11[EDX] {SB-C::VECTOR-LENGTH 1 7} => t23[EAX]
  4: MOVE t23[EAX] => t24[EBX]

--------------------------------------------------------------------------------
#2
(defun quux (v)
  (declare (optimize (speed 3) (safety 0) (space 2) (debug 0)))
  (declare (type (simple-array double-float 1) v))
  (let ((s 0d0))
    (declare (type double-float s))
    (dotimes (i (length v))
      (setq s (+ s (aref v i))))
    s))

* Python does not combine + with AREF, so generates extra move and
  allocates a register.

* On X86 Python thinks that all FP registers are directly accessible
  and emits costy MOVE ... => FR1.

--------------------------------------------------------------------------------
#3
(defun bar (n)
  (declare (optimize (speed 3) (safety 0) (space 2))
           (type fixnum n))
  (let ((v (make-list n)))
    (setq v (make-array n))
    (length v)))

* IR1 does not optimize away (MAKE-LIST N).
--------------------------------------------------------------------------------
#4
(defun bar (v1 v2)
  (declare (optimize (speed 3) (safety 0) (space 2))
           (type (simple-array base-char 1) v1 v2))
  (dotimes (i (length v1))
    (setf (aref v2 i) (aref v1 i))))

VOP DATA-VECTOR-SET/SIMPLE-STRING V2!14[EDI] t32[EAX] t30[S2]>t33[CL]
                                  => t34[S2]<t35[AL] 
        MOV     #<TN t33[CL]>, #<TN t30[S2]>
        MOV     BYTE PTR [EDI+EAX+1], #<TN t33[CL]>
        MOV     #<TN t35[AL]>, #<TN t33[CL]>
        MOV     #<TN t34[S2]>, #<TN t35[AL]>

* The value of DATA-VECTOR-SET is not used, so there is no need in the
  last two moves.

* And why two moves?
--------------------------------------------------------------------------------
#8
(defun foo (d)
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (declare (type (double-float 0d0 1d0) d))
  (loop for i fixnum from 1 to 5
        for x1 double-float = (sin d) ;;; !!!
        do (loop for j fixnum from 1 to 4
                 sum x1 double-float)))

Without the marked declaration Python will use boxed representation for X1.

This is equivalent to

(let ((x nil))
  (setq x 0d0)
  ;; use of X as DOUBLE-FLOAT
)

The initial binding is effectless, and without it X is of type
DOUBLE-FLOAT. Unhopefully, IR1 does not optimize away effectless
SETs/bindings, and IR2 does not perform type inference.
--------------------------------------------------------------------------------
#9 "Multi-path constant folding"
(defun foo (x)
  (if (= (cond ((irgh x) 0)
               ((buh x) 1)
               (t 2))
         0)
      :yes
      :no))

This code could be optimized to

(defun foo (x)
  (cond ((irgh x) :yes)
        ((buh x) :no)
        (t :no)))
--------------------------------------------------------------------------------
#11
(inverted variant of #9)

(lambda (x)
  (let ((y (sap-alien x c-string)))
    (list (alien-sap y)
          (alien-sap y))))

It could be optimized to

(lambda (x) (list x x))

(if Y were used only once, the current compiler would optimize it)
--------------------------------------------------------------------------------
#12
(typep (truly-the (simple-array * (*)) x) 'simple-vector)

tests lowtag.
--------------------------------------------------------------------------------
#13
FAST-+/FIXNUM and similar should accept unboxed arguments in interests
of representation selection. Problem: inter-TN dependencies.
--------------------------------------------------------------------------------
#14
The derived type of (/ (THE (DOUBLE-FLOAT (0D0)) X) (THE (DOUBLE-FLOAT
1D0) Y)) is (DOUBLE-FLOAT 0.0d0). While it might be reasonable, it is
better to derive (OR (MEMBER 0.0d0) (DOUBLE-FLOAT (0.0d0))).
--------------------------------------------------------------------------------
#15
On the alpha, the system is reluctant to refer directly to a constant bignum,
preferring to load a large constant through a slow sequence of instructions,
then cons up a bignum for it:

(LAMBDA (A)
  (DECLARE (OPTIMIZE (SAFETY 1) (SPEED 3) (DEBUG 1))
           (TYPE (INTEGER -10000 10000) A)
           (IGNORABLE A))
  (CASE A
    ((89 125 16) (ASH A (MIN 18 -706)))
    (T (DPB -3 (BYTE 30 30) -1))))
--------------------------------------------------------------------------------
#16
(do ((i 0 (1+ i)))
    ((= i (the (integer 0 100) n)))
  ...)

It is commonly expected for Python to derive (FIXNUMP I). (If ``='' is
replaced with ``>='', Python will do.)
--------------------------------------------------------------------------------
#17 
Type tests for (ARRAY BIT), (ARRAY T) and similar go through full
%TYPEP, even though it is relatively simple to establish the arrayness
of an object and also to obtain the element type of an array.  As of
sbcl-0.8.12.30, this affects at least DUMP-OBJECT through
COMPOUND-OBJECT-P, and (LABELS MAYBE-EMIT-MAKE-LOAD-FORMS GROVEL)
through TYPEP UNBOXED-ARRAY, within the compiler itself.
--------------------------------------------------------------------------------
#18
(lambda (x) (declare (null x)) (sxhash x)) goes through SYMBOL-HASH
rather than either constant-folding or manipulating NIL-VALUE or
NULL-TN directly.
--------------------------------------------------------------------------------
#19
  (let ((dx (if (foo)
                (list x)
                (list y z))))
    (declare (dynamic-extent dx))
    ...)

DX is not allocated on stack.
--------------------------------------------------------------------------------
#20
(defun-with-dx foo (x)
  (flet ((make (x)
           (let ((l (list nil nil)))
             (setf (first l) x)
             (setf (second l) (1- x))
             l)))
    (let ((l (make x)))
      (declare (dynamic-extent l))
      (mapc #'print l))))

Result of MAKE is not stack allocated, which means that
stack-allocation of structures is impossible.
--------------------------------------------------------------------------------
#21
(defun-with-dx foo ()
  (let ((dx (list (list 1 2) (list 3 4)
    (declare (dynamic-extent dx))
    ...)))))

External list in DX is allocated on stack, but internal are not.
--------------------------------------------------------------------------------
#22
IR2 does not perform unused code flushing.
--------------------------------------------------------------------------------
#23
Python does not know that &REST lists are LISTs (and cannot derive it).
--------------------------------------------------------------------------------
#24
a. Iterations on &REST lists, returning them as VALUES could be
   rewritten with &MORE vectors.
b. Implement local unknown-values mv-call (useful for fast type checking).