From: Will M. F. <farr@MIT.EDU> - 2005-02-22 20:44:17
|
Hello, Just for fun, I was trying to speed up the SBCL implementations of some of the short benchmarks on the Computer Language Shootout page (http://shootout.alioth.debian.org ). The takfp benchmark, in particular, is one where the SBCL implementation seems needlessly slow. It seemed to me that code like (defun takn (n) (declare (fixnum n) (optimize (speed 3))) (labels ((tak (x y z) (declare (double-float x y z)) (if (<= x y) z (tak (tak (- x 1.0d0) y z) (tak (- y 1.0d0) z x) (tak (- z 1.0d0) x y))))) (tak (* n 3.0d0) (* n 2.0d0) (* n 1.0d0)))) would enable about all optimizations that could be done. However, I still get a compiler note about 'doing float to pointer coercion (cost 13) from Z to "<return value>"'. I read the CMUCL manual's efficiency notes (and also the short section in the SBCL manual), and it seems like the compiler should be able to use a non-descriptor representation for the return value of the local function tak here. I've tried inserting various ftype declarations into the code, too, but the compiler never stops doing the float -> pointer conversion! Is there a way to make it use the non-descriptor representation? (This is a *big deal* for efficiency; computing (takn 10) on my machine conses 1,214,448,728 bytes!) If this is an optimization that still needs to be done, if someone could point me toward the appropriate code in the compiler, I'd be happy to muck around and see what I can do (not that I'm very experienced in this area). Thanks! Will Farr |
From: Nicolas N. <Nic...@iw...> - 2005-02-23 09:25:14
|
"Will M. Farr" <farr@MIT.EDU> writes: > Hello, > > Just for fun, I was trying to speed up the SBCL implementations of some of > the short benchmarks on the Computer Language Shootout page > (http://shootout.alioth.debian.org ). The takfp benchmark, in particular, > is one where the SBCL implementation seems needlessly slow. Hmm. This shootout is getting worse and worse. Why does this guy include such completely useless test functions? Why not computing integer factorial(1000)? (Answer: because C and C++ cannot do this easily.) This would at least have a little sense behind it... Yours, Nicolas. P.S.1: Don't understand me wrong: having float values returned in registers would be nice, of course, but the TAKFP function does not at all convince me of the necessity for this feature. And number crunching applications avoid such functions anyway and operate destructively on arrays of floats instead. P.S.2: I have commented on this shootout some time ago in cll, see http://groups.google.com/groups?q=g:thl410376093d&dq=&hl=de&lr=&selm=87isdsj5ak.fsf%40ortler.iwr.uni-heidelberg.de It may be worthwile to read this message (and also the whole thread). |
From: Will M. F. <farr@MIT.EDU> - 2005-02-23 09:43:23
|
Nicolas, I agree that the shootout benchmarks are often so small as to be pointless, and also that the design of the benchmarking system is definitely not tuned to lisp/scheme style. Nevertheless, I do think it would be a useful optimization to have some sort of immediate double-float representation, at least for purely local functions. (Maybe I'm just coming from a scheme background and thinking of loops this way, but it seems to me that there are plenty of times when I would love to have a functional approach -- i.e. composition -- where intermediate doubles are not boxed yet the functions are not suitable for inlining.) However, I can understand wanting more than my desire before making a significant change to sbcl internals :). Will On 23 Feb 2005, at 4:24 AM, Nicolas Neuss wrote: > "Will M. Farr" <farr@MIT.EDU> writes: > >> Hello, >> >> Just for fun, I was trying to speed up the SBCL implementations of >> some of >> the short benchmarks on the Computer Language Shootout page >> (http://shootout.alioth.debian.org ). The takfp benchmark, in >> particular, >> is one where the SBCL implementation seems needlessly slow. > > Hmm. This shootout is getting worse and worse. Why does this guy > include > such completely useless test functions? Why not computing integer > factorial(1000)? (Answer: because C and C++ cannot do this easily.) > This > would at least have a little sense behind it... > > Yours, > Nicolas. > > P.S.1: Don't understand me wrong: having float values returned in > registers > would be nice, of course, but the TAKFP function does not at all > convince > me of the necessity for this feature. And number crunching > applications > avoid such functions anyway and operate destructively on arrays of > floats > instead. > > P.S.2: I have commented on this shootout some time ago in cll, see > > http://groups.google.com/groups?q=g: > thl410376093d&dq=&hl=de&lr=&selm=87isdsj5ak.fsf%40ortler.iwr.uni- > heidelberg.de > It may be worthwile to read this message (and also the whole thread). |
From: Alexey D. <ade...@co...> - 2005-02-24 03:47:36
|
Hello, "Will M. Farr" <farr@MIT.EDU> writes: > (defun takn (n) > (declare (fixnum n) > (optimize (speed 3))) > (labels ((tak (x y z) > (declare (double-float x y z)) > (if (<= x y) > z > (tak (tak (- x 1.0d0) y z) > (tak (- y 1.0d0) z x) > (tak (- z 1.0d0) x y))))) > (tak (* n 3.0d0) (* n 2.0d0) (* n 1.0d0)))) > > would enable about all optimizations that could be done. However, I > still get a compiler note about 'doing float to pointer coercion (cost > 13) from Z to "<return value>"'. I read the CMUCL manual's efficiency > notes (and also the short section in the SBCL manual), and it seems > like the compiler should be able to use a non-descriptor representation > for the return value of the local function tak here. It is able. It just does not understand that it is more efficient. For me replacing the last form with (+ (tak (* n 3.0d0) (* n 2.0d0) (* n 1.0d0)) 0d0) almost eliminates consing and halves the run time. -- Regards, Alexey Dejneka "Alas, the spheres of truth are less transparent than those of illusion." -- L.E.J. Brouwer |
From: Will M. F. <farr@MIT.EDU> - 2005-02-24 12:27:37
|
Thanks very much; interestingly, the compiler still emits its warning, but the consing is eliminated for me, too. Will On 23 Feb 2005, at 10:51 PM, Alexey Dejneka wrote: > Hello, > > "Will M. Farr" <farr@MIT.EDU> writes: > >> (defun takn (n) >> (declare (fixnum n) >> (optimize (speed 3))) >> (labels ((tak (x y z) >> (declare (double-float x y z)) >> (if (<= x y) >> z >> (tak (tak (- x 1.0d0) y z) >> (tak (- y 1.0d0) z x) >> (tak (- z 1.0d0) x y))))) >> (tak (* n 3.0d0) (* n 2.0d0) (* n 1.0d0)))) >> >> would enable about all optimizations that could be done. However, I >> still get a compiler note about 'doing float to pointer coercion (cost >> 13) from Z to "<return value>"'. I read the CMUCL manual's efficiency >> notes (and also the short section in the SBCL manual), and it seems >> like the compiler should be able to use a non-descriptor >> representation >> for the return value of the local function tak here. > > It is able. It just does not understand that it is more efficient. For > me replacing the last form with > > (+ (tak (* n 3.0d0) (* n 2.0d0) (* n 1.0d0)) 0d0) > > almost eliminates consing and halves the run time. > > -- > Regards, > Alexey Dejneka > > "Alas, the spheres of truth are less transparent than those of > illusion." -- L.E.J. Brouwer |
From: Alexey D. <ade...@co...> - 2005-02-26 04:23:12
|
"Will M. Farr" <farr@MIT.EDU> writes: > Thanks very much; interestingly, the compiler still emits its warning, > but the consing is eliminated for me, too. The consing is not eliminated completely, but it is moved out of loop (from TAK to TAKN). SBCL consing measure by TIME is imprecise: try to put TAKN call into loop. -- Regards, Alexey Dejneka "Alas, the spheres of truth are less transparent than those of illusion." -- L.E.J. Brouwer |