From: R. Clint Whaley <rcwhaley@ls...> - 2013-11-02 04:17:35
I've released 3.11.19. The main work is in reducing the amount of
workspace the new framework allocates. I had first removed the
dependencies on the block-major stuff in the parallel BLAS, which left
them working in a simplified way. Then, I noticed that I had a parallel
performance regression in QR, which is probably related to not re-tuning
NB for the new framework. I started a big tuning job, and had to hard
reset the machine due to swapping making it impossible to type.
This was my big clue I needed to reduce workspace being used in the new
GEMM. I have still not ensured that the parallel stuff is as fast as it
should be, will return to that later.
ATLAS 3.11.19 released 11/01/13, highlights of changes from 3.11.18:
* Removed block-major GEMM dep from all threading code
* Performed recursion for K > 3000 in order to put a ceiling on
* Added ammm MNK loop order to save workspace for non-square GEMM