Thread: [atlas-devel] 3.11.7
Brought to you by:
rwhaley,
tonyc040457
From: Clint W. <wh...@cs...> - 2013-01-28 01:07:30
|
Guys, I've released 3.11.7, which improves ATLAS's AMD performance somewhat. I haven't been bothering with piledriver, due to the fact that it seemed so crippled what with the FPU shared between the cores and all. However, when I looked at it a little closer, it was clear I was too quick to dismiss AMD. Yes, the 8-core piledriver is essentially a 4-core system as far as FPU-intensive code goes, but you can still buy them cheaper than an Intel 4-core system. Intel can do 8 flops/cycle in double ((1fadd + 1 fmul)*veclen = 2*4 = 8), which AMD can also do on one pair of integer cores (flops(fmac) * 2FMAC units * veclen = 2*2*2=8). So, if ATLAS can improve its % of peak on this platform, it should be extremely competitive in the flops/$ category. Unfortunately, it is not easy to achieve serial peak on these machines. So, far, my best DGEMM kernel gets around 75% of peak (on Sandy/Ivy Bridge, my best kernel is 92% peak), and is around 15% slower than ACML for very large problems. I haven't tuned single at all yet, and the news is even worse there (around 66% of theoretical peak and way slower than ACML). However, unlike ACML, my SGEMM seems to produce the right answer for all tested sizes. So, we still have a long way to go on this platform, but at least we're in the game with 3.11.7. Since there's a meaningful gap with ACML, I know there's even more performance to find when/if I have time to return to tuning on this system. I also slightly improved ATLAS AMD K10h8 64-bit DGEMM performance, but you have to pull some install-fu to get that improvement in the library, so I wouldn't worry about 3.11.7 unless you are using an FMA3-equipped machine. Cheers, Clint ATLAS 3.11.7 released 01/27/13, highlights of changes from 3.11.6: * Added ATL_dammm24x1x256_sse2.S gets 92% of peak on AMDK10-64 * Added ATL_dammm6x4x256_fma3.S gets around 72% of peak on AMD piledriver * Added ATL_sammm24x4x256_fma3.S gets around 66% of peak on AMD piledriver ************************************************************************** ** R. Clint Whaley, PhD ** Assoc Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** |