Thread: [Sbcl-devel] Some unusual performance issues

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi all,

I've been fiddling with SBCL recently and timing various things to see
how fast it is. Obviously it is pretty impressive, but I've found some
unusual performance issues and wanted to ask on the list whether I was
doing something wrong, or if there's a specific reason these issues
exist. Please forgive my lousy Lisp code, I only started writing
Common Lisp recently.

Here are some simple definitions I start with:

(defmacro add-mod64 (x y)
;; simulate addition of 64 bit signed integers (the compiler seems to
know to optimise this, so long as it is a macro and not a function)
  `(ldb (byte 64 0) (+ ,x ,y)))

(defmacro sub-mod64 (x y)
;; simulate addition of 64 bit signed integers (the compiler seems to
know to optimise this, so long as it is a macro and not a function)
  `(ldb (byte 64 0) (- ,x ,y)))

(defmacro add-gen (x y)
;; define a generic addition which dispatches on type (the compiler
seems to know to optimise this away, so long as it is a macro and not
a function)
   `(if (typep ,x '(signed-byte 64))
     (add-mod64 ,x ,y)
     (+ ,x ,y)))

(defmacro sub-gen (x y)
;; define a generic subtraction which dispatches on type (the compiler
seems to know to optimise this away, so long as it is a macro and not
a function)
   `(if (typep ,x '(signed-byte 64))
     (sub-mod64 ,x ,y)
     (- ,x ,y)))

Now here is the function I'm timing:

(defun sumit (n)
   ;; a function which sets up a simple loop which does an addition
inside the loop
   (declare (optimize (speed 3) (safety 0)))
   (let ((j 1))
      (declare (type (signed-byte 64) n j))
      (dotimes (i n j)
          (setq j (add-gen j i)))))

Here's my "driver". I just iterate calling the above function a bunch
of times. This is the function I'm timing. It's only a made up example
to time SBCL. It's not meant to do anything interesting.

(let ((j 0))
  (declare (type (signed-byte 64) j))
  (dotimes (k 100000 j)
      (setq j (sumit 100000))))

Now that takes about 40s or thereabouts on my machine (x86_64 AMD).

OK, now here is the odd thing. Let's replace the definition of sumit
with the following:

(defun sumit (n)
   ;; a function which sets up a simple loop which does an addition
inside the loop
   (declare (optimize (speed 3) (safety 0)))
   (let ((j 1))
       (declare (type (signed-byte 64) n j))
       (dotimes (i n (sub-gen j 1)) ; <----- this is the line I've
changed, with j becoming (sub-gen j 1)
          (setq j (add-gen j i)))))

I time it using my driver function as before. However, this time it
takes 8s, i.e. it's 5 times faster!!

I've tried replacing j with (the (signed-byte 64) j) and also with
(ldb (byte 64 0) j), but neither of these improves the time
substantially. But subtracting that 1 from the return value works.
Oddly, subtracting 0 from the return value does NOT speed it up!

So what gives?

Also, I've noticed that a function based on "do" or "loop" takes twice
as long as one based on "dotimes".

If I unroll the loops manually, I can get some improvement in these
cases, which seems to imply a missing opportunity for optimisation in
the compiler. A "for loop" similar to the above would be unrolled and
made about 6 times faster in C. Unrolling manually here in SBCL
doesn't give that much of an improvement though, perhaps 2 times
faster.

Here is an example of the sumit function defined in terms of "loop"
instead of "dotimes" (again I have to subtract 1 on return from the
function to make it "fast"):

(defun sumit (n)
   ;; a function which sets up a simple loop which does an addition
inside the loop
   (declare (optimize (speed 3) (safety 0)))
   (let ((j 1))
       (declare (type (signed-byte 64) n j))
       (let ((i 0))
       (declare (type (signed-byte 64) i))
          (loop
            (setq j (add-gen j i))
            (setq i (add-gen i 1))
            (when (not (< i n)) (return))))
       (sub-gen j 1)))

Am I doing something wrong? Or are these behaviours expected? Or are
there missed opportunities for optimisation in the SBCL compiler?

Bill.

Thread: [Sbcl-devel] Some unusual performance issues

Common Lisp compiler and runtime

sbcl-devel