I've been using ATLAS for some time now to solve dot product operations and matrix multiplications as well, mainly because this library provides a really nice performance (ATLAS is a great work, if you ask me! ;).
I belive it makes use of the SSE extensions to operate the vectors, since ATLAS detects the availability of such instructions when we're installing it, am I right?
Therefore, I disassembled some programs in the search for the SSE instructions to see how the library operates the vectors, and I couldn't find any function that really used the SSE extensions, so I'd like to know what I'm doing wrong. Am I searching in the wrong place?
I'm using an AMD Athlon XP 1700+ processor (~1.46 GHz) which has the SSE1 extensions available. The program I disassembled invoked the cblas_sdot function to operate two float vectors, since it wouldn't make sense to use cblas_ddot with the processor I have.
Compiler: gcc 4.1.1
Disassembler: objdump -d [binary_name]
Linux distribution: Slackware 11.0, kernel 2.6.19