From: Ian Scott <ian.m.scott@st...>  20030124 17:30:07

I have done some timing experiments comparing the speed of Level 1 and 2 Blas operations in Intel's optimised math kernel. These are the vectorvector and matrixvector algorithms. Summary: It is definitely not worth using the Intel MKL instead of gcc on the above platform. The included files show timing results for a variety of sizes of matrix and vectors, in float and double, doing Dot product Matrix*vector The sum of square differences function is run using native VNL in both results for a sanity check. We are using MKL version 5.2, and gcc 3.2 with the following flags g O3 DNDEBUG march=pentium4 mfpmath=sse. The computer is a (dual) 1700Mhz Xeon (Pentium 4) with 2Gb of memory, running Linux. You can see from the included results that Level 1 blas is always slower than native VNL  which isn't a surprise  there isn't much in they way of intelligent cache usage that cade be done here. The Matrixvector operations which are used a lot by people here are sometimes a bit faster using MKL, sometimes a lot slower, with no immediately obvious pattern. Congratulations to AWF, VND, and other authors of vnl_vector and gcc. Your code beats a hand tuned implementation by Intel. Caveats: If you are doing heavy MatrixMatrix operations on a multiprocessing computer, then MKL will efficiently use all your processors  which has to be an advantage. If you have a poor compiler (e.g. MSVC) then it may be worth using the MKL. We have evidence that MKL gives a speed up of 510 times over the the nieve matrixvector multiplication implemented in the VisSDK and compiled with MSVC 6.0. If anyone wants to integrate BLAS into VNL, then I have included the mods to do it. If an option was put into CMAKE to turn them off, then I guess it could be added permanently to VNL. Regards, Ian. 