Short answer  this is entirely nontrivial  using BLAS would be better.
Long answer  As I understand it:
MMX is a means of performing SIMD on narrow integers (either 8 bit with 4
fold parallelism, or 16 bit with a 2 parallelism.)
Most of VNL is bound by computations on the type double. You would therefore
need to at the very least ensure that the code you are interested in runs in
terms of fixedpoint real representations.
This is difficult
You have to ensure that your real numbers don't over or underflow (You can
only represent numbers from x to x*2^(n) where n is 8 or 16, and x is the
largest number you want to represent. Many interesting operations take
vastly more dynamic range than that.
You would need to rewrite the bits of VXL that aren't templated so
that they used your integer format.
Alternatively there is another (its not MMX, called something different I
believe) method that does SIMD on pairs of singleprecision floats. Both the
above points still apply. The dynamic range of floats is considered too low
for many interesting operations (SVD, etc.)
I would have thought that a faster, more useful thing to do would be to
integrate BLAS into VNL so that we can use the properly optimised processor
specific BLAS routines. This could give a speed of more that 2.
Ian.
> Hi,
>
> Is it possible  and if yes: how  to compile vxl's numerics taking
> adventage of MMX (pentium/pc). I just thought about it compiling lame
> (mp3 enc.)  if "nasm" (x86asembler) is present on the system it is
> possible to build also MMX executable version that is more
> than twice as
> fast as the normal one.
> any ideas to start with?
>
> thanks
> Domi
Dominik Szczerba, Dr.
