For all your architectural defaults, have you made sure you have good compiler flags? In particular, have you ran the flagsearch described on page 10 of ATLAS/doc/atlas_install.pdf from the newest developer release?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did choose the optimization flags that are used for our compiler optimization work.
(We also do the gcc port for s390 in house, Andreas Krebbel is the gcc s390 port maintainer and the whole gcc work focusses on -O3 -funroll-loops). I also tried the flagsearch run, which suggested -Os to be slightly faster, but this just uncovered a corner case gcc optimization problem for -O2 and -O3 for the GEMM kernel used in flagsearch. we want to fix this in gcc instead. I have verified that most other things (e.g. AXPY) run a lot slower with -Os (and having done 1.5 years of gcc work, I can say that -Os dont get a lot of attention)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
z10 64bit system defaults
For all your architectural defaults, have you made sure you have good compiler flags? In particular, have you ran the flagsearch described on page 10 of ATLAS/doc/atlas_install.pdf from the newest developer release?
I did choose the optimization flags that are used for our compiler optimization work.
(We also do the gcc port for s390 in house, Andreas Krebbel is the gcc s390 port maintainer and the whole gcc work focusses on -O3 -funroll-loops). I also tried the flagsearch run, which suggested -Os to be slightly faster, but this just uncovered a corner case gcc optimization problem for -O2 and -O3 for the GEMM kernel used in flagsearch. we want to fix this in gcc instead. I have verified that most other things (e.g. AXPY) run a lot slower with -Os (and having done 1.5 years of gcc work, I can say that -Os dont get a lot of attention)