With 3.11.8 on an 3770K (3.50Ghz) system with GCC 4.7.2 I obtain the following GFLOPs for parallel GEMM:
ATLAS OpenBLAS (SB) N M K sgemm dgemm sgemm dgemm 1024 1024 1024 160.0 93.4 194.4 103.0 96 64 25024 74.0 58.1 110.1 56.9 96 192 25024 82.7 72.6 132.2 74.8 150 125 25024 96.5 54.7 111.5 61.2 150 375 25024 153.8 84.4 151.4 75.3 60 35 25024 75.6 37.2 36.1 19.5 60 105 25024 102.2 53.8 105.0 50.7
where ATLAS has been built to use only four cores (ignoring the IDs of the HT virtual cores). The single-precision performance is somewhat worse than OpenBLAS in around half of cases.
R. Clint Whaley
2013-03-25
First, thanks for posting the comparison timings!
Can you try a very large problem like M=N=K=5000. I should get around 92% of peak there, and so there should not be much room to beat me. If you don't see that, then I'll wonder if the new framework install went bad or something.
As for the really non-square cases, I think the new framework I'm working on should eventually catch us up there, but it is in the early days right now, and I'm mostly working on the square and rank-K cases.
When I get that done, I will then turn to better handling all the weird shapes. If the shapes you choose come from actual usage, let me know, so I can add them to my list of things to try to eventually support well!
Many thanks,
Clint
Freddie Witherden
2013-03-26
For M=N=K=5000 I get 202 GFLOP/s for SGEMM and 107 GFLOP/s for DGEMM.
While I do not have the full results on me at the moment I've seen the git OpenBLAS pull 235 GFLOP/s SGEMM on the same system for M=N=K=1024. This puts ATLAS at least 15% off peak (assuming, unrealistically, that OpenBLAS is getting peak). Should I consider recompiling, and if so are there any special flags/configure options I should investigate?
The shapes are from my real world fluid solver which spends ~80% of its time performning such multiplications. Key M, N values are:
M N 96 64 96 192 150 125 150 375 40 20 40 60 60 35 60 105
with K going from 4000 to 40,000.
Freddie Witherden
2013-04-04
Just as a slight, unfortunate, correction. The table headings should be K, M, N as opposed to N, M, K. Sorry about this.
R. Clint Whaley
2014-07-09
R. Clint Whaley
2014-07-09
Can you try this again with the newest developer release? It has a partial (still ongoing) rewrite of the threaded stuff for better performance.
R. Clint Whaley
2014-08-16
No response, closing.
R. Clint Whaley
2014-08-16