Thread: [Sbcl-devel] SSE instructions and vectorization

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Dear all,

I can report an initial success at implementing vectorization in SBCL
at the VOP level. The attached patch tells SBCL about the existence of
SSE (Streaming SIMD Extension) registers, which are 128-bits, and a
couple of (special-cased) instructions; the lisp files can then be
compiled and loaded to exploit them. Note that I don't really suggest
that this patch go in at this stage; certainly, I'd like it to be
looked at by other people on this list and preferably also some people
on cmucl-imp, as they know much more about the compiler than I do, and
might be able to tell me that I've done something in a wrong or
inelegant way, as well as (possibly) contribute other instructions (I
hate architecture manuals).

The experimentation focuses around the addition of two fixnum vectors
into a result vector. Testing the sse2.lisp file shows that the
ordinary x86 assembly version (the VECTOR+/SIMPLE-ARRAY-SIGNED-BYTE-30
VOP) is four to five times faster than the compiled lisp code (from
the BAR function); pushing :sse2 onto sb-c::*backend-subfeatures* and
recompiling sse2.lisp yields a further 20% speedup.

However, I have reason to believe that further dramatic speedup can be
obtained with SSE in this area, as I am currently moving data into the
SSE registers with the movdqu instruction, which caters for unaligned
data. If vector data could be aligned at 16-byte addresses, then the
faster movdqa instruction can be used, which should give a further
increase in speed...

This all presupposes some way of convicing the compiler to emit these
nice vectorized VOPs for normal code. I've had several thoughts on the
issue, none of which are totally satisfactory, but I'm sure that we
can come up with something plausible to put in the sb-ext package...

I don't know if this is of any use to anyone, but I'm having fun.

Cheers,

Christophe
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/                  (defun pling-dollar 
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)

Thread: [Sbcl-devel] SSE instructions and vectorization

Common Lisp compiler and runtime

sbcl-devel