[Sbcl-help] DX vs TCE: who wins in a fight?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I think I'd like a dx make-list for smallish list lengths (well,
actually dx copy-seq, subseq, reverse, append, mapcar... but I'll take a
stack allocated list full of nil; I can implemlement dx versions of all
of the rest with that).

By inspection, this seems to work:

(defvar *max-dx-list-length* 50) ;; reasonable? 

(defmacro with-list ((var length) &body body
                     &aux (recurse (gensym)) (len (gensym)))
  `(labels ((,recurse (n acc fun)
              (if (zerop n)
                  (funcall fun acc)
                  (let ((acc (cons nil acc)))
                    (declare (dynamic-extent acc))
                    (,recurse (1- n) acc fun)))))
     (declare (inline ,recurse)
              (dynamic-extent (function ,recurse)))
     (let ((,len ,length))
       (if (< -1 ,len *max-dx-list-length*)
           (,recurse ,len nil (lambda (,var) ,@body))
           (let ((,var (make-list ,len)))
             ,@body)))))

A couple of timing tests are below, using the default optimization
settings in a fresh SBCL process. Looks like this avoids consing under
those conditions, and may offer a speedup besides. It also seems that
the inline declaration is important to minimize both CPU usage and
consing. (Not demonstrated below.)

Questions:

0. Because this recursion is always bounded and smallish, I think I
don't care about tail call elimination as a way to avoid running out of
stack. Should I care about TCE for other reasons though?

1. IDK whether or how TCE & DX declarations might interact under
different optimization settings. I'm not sure I care too much, but what
I definitely don't want is an interaction that'll corrupt the Lisp image
(when I'm obeying DX discipline, of course). I can't figure any of this
out by looking around in the source, and the manual doesn't really
clarify. Anybody know if these can interact in a way that could be
dangerous?

2. If the answer is "you must use a higher-than-default debug
optimization that inhibits TCE in order for that sort of trickery to be
reliable", I think that's fine. Evaluating the timing measurements with
(debug 3) optimization seems to slow things down a bit, but I'm okay
with that (demonstration below).

3. Or if the answer is "don't do any of this; it might work but isn't
supported", that'd be good to know, too. 

Thanks in advance for any insights.

Regards,
Richard

;; Baseline, using make-list. The GC call is just to minimize noise.
* (progn
   (sb-ext:gc :full t)
   (time
    (let ((sum 0))
      (dotimes (i 1000 sum)
        (dotimes (j 50)
          (let ((list (make-list j)))
	    (setq sum (+ sum (length list)))))))))
Evaluation took:
  0.011 seconds of real time
  0.011280 seconds of total run time (0.006578 user, 0.004702 system)
  100.00% CPU
  29,231,880 processor cycles
  19,604,736 bytes consed

1225000

;; Using the WITH-LIST macro:
* (progn
   (sb-ext:gc :full t)
   (time
    (let ((sum 0))
      (dotimes (i 1000 sum)
        (dotimes (j 50)
          (with-list (list j)
	    (setq sum (+ sum (length list)))))))))
Evaluation took:
  0.002 seconds of real time
  0.002374 seconds of total run time (0.002374 user, 0.000000 system)
  100.00% CPU
  6,151,518 processor cycles
  0 bytes consed

1225000

;; Try it again with high DEBUG optimization
* (declaim (optimize (debug 3)))
NIL

;; Apparently identical to the measurements under the default
;; optimization settings.
* (progn
   (sb-ext:gc :full t)
   (time
    (let ((sum 0))
      (dotimes (i 1000 sum)
        (dotimes (j 50)
          (let ((list (make-list j)))
	    (setq sum (+ sum (length list)))))))))
Evaluation took:
  0.011 seconds of real time
  0.011380 seconds of total run time (0.006574 user, 0.004806 system)
  100.00% CPU
  29,493,228 processor cycles
  19,604,736 bytes consed

1225000

;; Apparently more CPU is consumed in this case, but still no
;; consing; I'll take it.
* (progn
   (sb-ext:gc :full t)
   (time
    (let ((sum 0))
      (dotimes (i 1000 sum)
        (dotimes (j 50)
          (with-list (list j)
	    (setq sum (+ sum (length list)))))))))
Evaluation took:
  0.003 seconds of real time
  0.003615 seconds of total run time (0.003615 user, 0.000000 system)
  133.33% CPU
  9,368,840 processor cycles
  0 bytes consed

1225000

[Sbcl-help] DX vs TCE: who wins in a fight?

Common Lisp compiler and runtime

[Sbcl-help] DX vs TCE: who wins in a fight?