I am trying to use ATLAS 3.10 on my laptop with an Intel Core i7-2630QM CPU with gcc 4.7.2
but my current benchmarks inside my own test say, that somehow the performance is really bad.
Here is the output: (fast_prod = wrapper around gemv/gemm, prod/axpy_prod=naiive product using boost::ublas)
Benchmarking matrix vector prod (768x 768)
fast_prod Ax: 2.59983
fast_prod A^Tx: 2.93648
prod Ax: 0.719953
prod A^Tx: 1.08993
Benchmarking matrix matrix prod for medium sized matrices (512x512)
fast_prod AX: 0.306646
fast_prod A^TX: 0.313313
fast_prod AX^T: 0.313313
fast_prod A^TX^T: 0.306646
axpy_prod AX: 0.633292
axpy_prod A^TX: 0.626626
axpy_prod AX^T: 1.22659
axpy_prod A^TX^T: 1.20992
to ensure that the error does not obviously lie in my code, i asked a friend to do the benchmark as well, his results were at least better:
Benchmarking matrix vector prod
fast_prod Ax: 0.863277
fast_prod A^Tx: 0.773283
prod Ax: 0.926606
prod A^Tx: 2.90981
Benchmarking matrix matrix prod for medium sized matrices
fast_prod AX: 0.179988
fast_prod A^TX: 0.179988
fast_prod AX^T: 0.176655
fast_prod A^TX^T: 0.176655
axpy_prod AX: 0.886609
axpy_prod A^TX: 0.963271
axpy_prod AX^T: 9.05941
axpy_prod A^TX^T: 9.23273
since we use the same distro (ArchLinux) we should have roughly the same setup aside from the hardware. Atlas is not a prebuild package but compiled during install.
I checked the build file and got this as build command:
# fix SSE1 only bug, see https://sourceforge.net/tracker/?func=detail&aid=3554109&group_id=23725&atid=379482
patch -Np0 -i "$srcdir/fix_sse1.patch"
if [ "$CARCH" = "x86_64" ]; then
ARCHITECTURE_BUILD_OPTS="-b 64" # for x86_64
ARCHITECTURE_BUILD_OPTS="-b 32" # for i686
../configure --prefix=/usr/ $ARCHITECTURE_BUILD_OPTS -Fa alg -fPIC \ --with-netlib-lapack-tarfile="$srcdir/lapack-$_lapackver.tgz"
which looks reasonable. So i reinstalled atlas and got the same result. after that I tried adding -Si archdef 0, compiled for hours and still get the same result. throttling is of course disabled.
I still suspect the Error to be on my side. Is there a way i can diagnose the problem? factor 3 worse than the naiive implementation is a lot.
Log in to post a comment.