From: R. Clint Whaley <cwhaley@cc...> - 2013-07-05 21:43:40
I have posted 3.11.11. The biggest news for most folks is that I've
disabled the block-major only parallel routines. For places where the
block-major GEMM is still faster, this may cause a slight slowdown in
However, on most systems, the new access-major GEMM is faster, and so
this should result in large speedups for these systems (in particular,
it should almost double haswell parallel performance).
The other thing new is that I have gotten the basics of a second
access-major format supported by the autotuning framework, which should
allow access-major to do well on older systems that cannot cheaply do a
vector broadcast (particularly, AMD systems prior to the hammer). I
don't yet have great kernels written for these machines, and the
block-major kernels can't go away until I get them up to snuff, but at
least the basics are working now.
Get latest updates about Open Source Projects, Conferences and News.