From: Dominique <domi@vi...>  20020626 10:56:15

Hi, Is it possible  and if yes: how  to compile vxl's numerics taking adventage of MMX (pentium/pc). I just thought about it compiling lame (mp3 enc.)  if "nasm" (x86asembler) is present on the system it is possible to build also MMX executable version that is more than twice as fast as the normal one. any ideas to start with? thanks Domi   Dominik Szczerba, Dr. Phone: +41 1 632 66 68 Computer Vision Lab, ETH Fax: +41 1 632 11 99 Gloriastr. 35 email: domi@... CH8092 Zurich http://www.vision.ee.ethz.ch/~domi  
From: Ian Scott <ian.scott@st...>  20020626 11:18:51

Short answer  this is entirely nontrivial  using BLAS would be better. Long answer  As I understand it: MMX is a means of performing SIMD on narrow integers (either 8 bit with 4 fold parallelism, or 16 bit with a 2 parallelism.) Most of VNL is bound by computations on the type double. You would therefore need to at the very least ensure that the code you are interested in runs in terms of fixedpoint real representations. This is difficult You have to ensure that your real numbers don't over or underflow (You can only represent numbers from x to x*2^(n) where n is 8 or 16, and x is the largest number you want to represent. Many interesting operations take vastly more dynamic range than that. You would need to rewrite the bits of VXL that aren't templated so that they used your integer format. Alternatively there is another (its not MMX, called something different I believe) method that does SIMD on pairs of singleprecision floats. Both the above points still apply. The dynamic range of floats is considered too low for many interesting operations (SVD, etc.) I would have thought that a faster, more useful thing to do would be to integrate BLAS into VNL so that we can use the properly optimised processor specific BLAS routines. This could give a speed of more that 2. Ian. > Original Message > From: Dominique [mailto:domi@...] > Sent: Wednesday, June 26, 2002 11:56 AM > To: Vxlusers@... > Subject: [Vxlusers] misc questions > > > Hi, > > Is it possible  and if yes: how  to compile vxl's numerics taking > adventage of MMX (pentium/pc). I just thought about it compiling lame > (mp3 enc.)  if "nasm" (x86asembler) is present on the system it is > possible to build also MMX executable version that is more > than twice as > fast as the normal one. > > any ideas to start with? > > thanks > Domi > >  >  > Dominik Szczerba, Dr. Phone: +41 1 632 66 68 > Computer Vision Lab, ETH Fax: +41 1 632 11 99 > Gloriastr. 35 email: domi@... > CH8092 Zurich http://www.vision.ee.ethz.ch/~domi >  > > > > >  > This sf.net email is sponsored by: Jabber Inc. > Don't miss the IM event of the season  Special offer for > OSDN members! > JabConf 2002, Aug. 2022, Keystone, CO http://www.jabberconf.com/osdn > _______________________________________________ > Vxlusers mailing list > Vxlusers@... > https://lists.sourceforge.net/lists/listinfo/vxlusers > 
From: Dominique <domi@vi...>  20020626 12:00:03

> Short answer  this is entirely nontrivial  using BLAS would be better. Thank you for dedailed answer. As to BLAS  I studied its docs a bit and they say they provide only general nonoptimized stuff. Like in lapac. They encourage to use ATLAS instead, BLAS succesor as I understood, which should be better to generate optimized stuff for given architecture. Altas is linked to from BLAS FAQ page on netlib. bye Domi > I would have thought that a faster, more useful thing to do would be to > integrate BLAS into VNL so that we can use the properly optimised processor > specific BLAS routines. This could give a speed of more that 2. > > Ian. 